Changelog in Linux kernel 6.6.81

afs: Fix the server_list to unuse a displaced server rather than putting it [+ + +]

Author: David Howells <dhowells@redhat.com>
Date:   Tue Feb 18 19:22:47 2025 +0000

    afs: Fix the server_list to unuse a displaced server rather than putting it
    
    [ Upstream commit add117e48df4788a86a21bd0515833c0a6db1ad1 ]
    
    When allocating and building an afs_server_list struct object from a VLDB
    record, we look up each server address to get the server record for it -
    but a server may have more than one entry in the record and we discard the
    duplicate pointers.  Currently, however, when we discard, we only put a
    server record, not unuse it - but the lookup got as an active-user count.
    
    The active-user count on an afs_server_list object determines its lifetime
    whereas the refcount keeps the memory backing it around.  Failing to reduce
    the active-user counter prevents the record from being cleaned up and can
    lead to multiple copied being seen - and pointing to deleted afs_cell
    objects and other such things.
    
    Fix this by switching the incorrect 'put' to an 'unuse' instead.
    
    Without this, occasionally, a dead server record can be seen in
    /proc/net/afs/servers and list corruption may be observed:
    
        list_del corruption. prev->next should be ffff888102423e40, but was 0000000000000000. (prev=ffff88810140cd38)
    
    Fixes: 977e5f8ed0ab ("afs: Split the usage count on struct afs_server")
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Marc Dionne <marc.dionne@auristor.com>
    cc: Simon Horman <horms@kernel.org>
    cc: linux-afs@lists.infradead.org
    Link: https://patch.msgid.link/20250218192250.296870-5-dhowells@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

afs: Make it possible to find the volumes that are using a server [+ + +]

Author: David Howells <dhowells@redhat.com>
Date:   Thu Nov 2 16:08:43 2023 +0000

    afs: Make it possible to find the volumes that are using a server
    
    [ Upstream commit ca0e79a46097d54e4af46c67c852479d97af35bb ]
    
    Make it possible to find the afs_volume structs that are using an
    afs_server struct to aid in breaking volume callbacks.
    
    The way this is done is that each afs_volume already has an array of
    afs_server_entry records that point to the servers where that volume might
    be found.  An afs_volume backpointer and a list node is added to each entry
    and each entry is then added to an RCU-traversable list on the afs_server
    to which it points.
    
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Marc Dionne <marc.dionne@auristor.com>
    cc: linux-afs@lists.infradead.org
    Stable-dep-of: add117e48df4 ("afs: Fix the server_list to unuse a displaced server rather than putting it")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: hda/realtek: Add quirks for ASUS ROG 2023 models [+ + +]

Author: Stefan Binding <sbinding@opensource.cirrus.com>
Date:   Mon Dec 18 15:12:17 2023 +0000

    ALSA: hda/realtek: Add quirks for ASUS ROG 2023 models
    
    [ Upstream commit a40ce9f4bdbebfbf55fdd83a5284fbaaf222f0b9 ]
    
    These models use 2xCS35L41amps with HDA using SPI and I2C.
    All models use Internal Boost.
    Some models also use Realtek Speakers in conjunction with
    CS35L41.
    All models require DSD support to be added inside
    cs35l41_hda_property.c
    
    Signed-off-by: Stefan Binding <sbinding@opensource.cirrus.com>
    Link: https://lore.kernel.org/r/20231218151221.388745-4-sbinding@opensource.cirrus.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Stable-dep-of: 9e7c6779e353 ("ALSA: hda/realtek: Fix wrong mic setup for ASUS VivoBook 15")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: hda/realtek: Fix microphone regression on ASUS N705UD [+ + +]

Author: Adrien Vergé <adrienverge@gmail.com>
Date:   Wed Feb 26 14:55:15 2025 +0100

    ALSA: hda/realtek: Fix microphone regression on ASUS N705UD
    
    commit c6557ccf8094ce2e1142c6e49cd47f5d5e2933a8 upstream.
    
    This fixes a regression introduced a few weeks ago in stable kernels
    6.12.14 and 6.13.3. The internal microphone on ASUS Vivobook N705UD /
    X705UD laptops is broken: the microphone appears in userspace (e.g.
    Gnome settings) but no sound is detected.
    I bisected it to commit 3b4309546b48 ("ALSA: hda: Fix headset detection
    failure due to unstable sort").
    
    I figured out the cause:
    1. The initial pins enabled for the ALC256 driver are:
           cfg->inputs == {
             { pin=0x19, type=AUTO_PIN_MIC,
               is_headset_mic=1, is_headphone_mic=0, has_boost_on_pin=1 },
             { pin=0x1a, type=AUTO_PIN_MIC,
               is_headset_mic=0, is_headphone_mic=0, has_boost_on_pin=1 } }
    2. Since 2017 and commits c1732ede5e8 ("ALSA: hda/realtek - Fix headset
       and mic on several ASUS laptops with ALC256") and 28e8af8a163 ("ALSA:
       hda/realtek: Fix mic and headset jack sense on ASUS X705UD"), the
       quirk ALC256_FIXUP_ASUS_MIC is also applied to ASUS X705UD / N705UD
       laptops.
       This added another internal microphone on pin 0x13:
           cfg->inputs == {
             { pin=0x13, type=AUTO_PIN_MIC,
               is_headset_mic=0, is_headphone_mic=0, has_boost_on_pin=1 },
             { pin=0x19, type=AUTO_PIN_MIC,
               is_headset_mic=1, is_headphone_mic=0, has_boost_on_pin=1 },
             { pin=0x1a, type=AUTO_PIN_MIC,
               is_headset_mic=0, is_headphone_mic=0, has_boost_on_pin=1 } }
       I don't know what this pin 0x13 corresponds to. To the best of my
       knowledge, these laptops have only one internal microphone.
    3. Before 2025 and commit 3b4309546b48 ("ALSA: hda: Fix headset
       detection failure due to unstable sort"), the sort function would let
       the microphone of pin 0x1a (the working one) *before* the microphone
       of pin 0x13 (the phantom one).
    4. After this commit 3b4309546b48, the fixed sort function puts the
       working microphone (pin 0x1a) *after* the phantom one (pin 0x13). As
       a result, no sound is detected anymore.
    
    It looks like the quirk ALC256_FIXUP_ASUS_MIC is not needed anymore for
    ASUS Vivobook X705UD / N705UD laptops. Without it, everything works
    fine:
    - the internal microphone is detected and records actual sound,
    - plugging in a jack headset is detected and can record actual sound
      with it,
    - unplugging the jack headset makes the system go back to internal
      microphone and can record actual sound.
    
    Cc: stable@vger.kernel.org
    Cc: Kuan-Wei Chiu <visitorckw@gmail.com>
    Cc: Chris Chiu <chris.chiu@canonical.com>
    Fixes: 3b4309546b48 ("ALSA: hda: Fix headset detection failure due to unstable sort")
    Tested-by: Adrien Vergé <adrienverge@gmail.com>
    Signed-off-by: Adrien Vergé <adrienverge@gmail.com>
    Link: https://patch.msgid.link/20250226135515.24219-1-adrienverge@gmail.com
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ALSA: hda/realtek: Fix wrong mic setup for ASUS VivoBook 15 [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Tue Feb 25 16:45:32 2025 +0100

    ALSA: hda/realtek: Fix wrong mic setup for ASUS VivoBook 15
    
    [ Upstream commit 9e7c6779e3530bbdd465214afcd13f19c33e51a2 ]
    
    ASUS VivoBook 15 with SSID 1043:1460 took an incorrect quirk via the
    pin pattern matching for ASUS (ALC256_FIXUP_ASUS_MIC), resulting in
    the two built-in mic pins (0x13 and 0x1b).  This had worked without
    problems casually in the past because the right pin (0x1b) was picked
    up as the primary device.  But since we fixed the pin enumeration for
    other bugs, the bogus one (0x13) is picked up as the primary device,
    hence the bug surfaced now.
    
    For addressing the regression, this patch explicitly specifies the
    quirk entry with ALC256_FIXUP_ASUS_MIC_NO_PRESENCE, which sets up only
    the headset mic pin.
    
    Fixes: 3b4309546b48 ("ALSA: hda: Fix headset detection failure due to unstable sort")
    Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219807
    Link: https://patch.msgid.link/20250225154540.13543-1-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: usb-audio: Avoid dropping MIDI events at closing multiple ports [+ + +]

Author: Takashi Iwai <tiwai@suse.de>
Date:   Tue Feb 18 12:40:24 2025 +0100

    ALSA: usb-audio: Avoid dropping MIDI events at closing multiple ports
    
    [ Upstream commit a3bdd8f5c2217e1cb35db02c2eed36ea20fb50f5 ]
    
    We fixed the UAF issue in USB MIDI code by canceling the pending work
    at closing each MIDI output device in the commit below.  However, this
    assumed that it's the only one that is tied with the endpoint, and it
    resulted in unexpected data truncations when multiple devices are
    assigned to a single endpoint and opened simultaneously.
    
    For addressing the unexpected MIDI message drops, simply replace
    cancel_work_sync() with flush_work().  The drain callback should have
    been already invoked before the close callback, hence the port->active
    flag must be already cleared.  So this just assures that the pending
    work is finished before freeing the resources.
    
    Fixes: 0125de38122f ("ALSA: usb-audio: Cancel pending work at closing a MIDI substream")
    Reported-and-tested-by: John Keeping <jkeeping@inmusicbrands.com>
    Closes: https://lore.kernel.org/20250217111647.3368132-1-jkeeping@inmusicbrands.com
    Link: https://patch.msgid.link/20250218114024.23125-1-tiwai@suse.de
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ALSA: usb-audio: Re-add sample rate quirk for Pioneer DJM-900NXS2 [+ + +]

Author: Dmitry Panchenko <dmitry@d-systems.ee>
Date:   Thu Feb 20 18:15:37 2025 +0200

    ALSA: usb-audio: Re-add sample rate quirk for Pioneer DJM-900NXS2
    
    commit 9af3b4f2d879da01192d6168e6c651e7fb5b652d upstream.
    
    Re-add the sample-rate quirk for the Pioneer DJM-900NXS2. This
    device does not work without setting sample-rate.
    
    Signed-off-by: Dmitry Panchenko <dmitry@d-systems.ee>
    Cc: <stable@vger.kernel.org>
    Link: https://patch.msgid.link/20250220161540.3624660-1-dmitry@d-systems.ee
    Signed-off-by: Takashi Iwai <tiwai@suse.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

amdgpu/pm/legacy: fix suspend/resume issues [+ + +]

Author: chr[] <chris@rudorff.com>
Date:   Wed Feb 12 16:51:38 2025 +0100

    amdgpu/pm/legacy: fix suspend/resume issues
    
    commit 91dcc66b34beb72dde8412421bdc1b4cd40e4fb8 upstream.
    
    resume and irq handler happily races in set_power_state()
    
    * amdgpu_legacy_dpm_compute_clocks() needs lock
    * protect irq work handler
    * fix dpm_enabled usage
    
    v2: fix clang build, integrate Lijo's comments (Alex)
    
    Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2524
    Fixes: 3712e7a49459 ("drm/amd/pm: unified lock protections in amdgpu_dpm.c")
    Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
    Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> # on Oland PRO
    Signed-off-by: chr[] <chris@rudorff.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit ee3dc9e204d271c9c7a8d4d38a0bce4745d33e71)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

arm64: dts: rockchip: Disable DMA for uart5 on px30-ringneck [+ + +]

Author: Lukasz Czechowski <lukasz.czechowski@thaumatec.com>
Date:   Tue Jan 21 13:56:04 2025 +0100

    arm64: dts: rockchip: Disable DMA for uart5 on px30-ringneck
    
    commit 5ae4dca718eacd0a56173a687a3736eb7e627c77 upstream.
    
    UART controllers without flow control seem to behave unstable
    in case DMA is enabled. The issues were indicated in the message:
    https://lore.kernel.org/linux-arm-kernel/CAMdYzYpXtMocCtCpZLU_xuWmOp2Ja_v0Aj0e6YFNRA-yV7u14g@mail.gmail.com/
    In case of PX30-uQ7 Ringneck SoM, it was noticed that after couple
    of hours of UART communication, the CPU stall was occurring,
    leading to the system becoming unresponsive.
    After disabling the DMA, extensive UART communication tests for
    up to two weeks were performed, and no issues were further
    observed.
    The flow control pins for uart5 are not available on PX30-uQ7
    Ringneck, as configured by pinctrl-0, so the DMA nodes were
    removed on SoM dtsi.
    
    Cc: stable@vger.kernel.org
    Fixes: c484cf93f61b ("arm64: dts: rockchip: add PX30-µQ7 (Ringneck) SoM with Haikou baseboard")
    Reviewed-by: Quentin Schulz <quentin.schulz@cherry.de>
    Signed-off-by: Lukasz Czechowski <lukasz.czechowski@thaumatec.com>
    Link: https://lore.kernel.org/r/20250121125604.3115235-3-lukasz.czechowski@thaumatec.com
    Signed-off-by: Heiko Stuebner <heiko@sntech.de>
    [ conflict resolution due to missing (cosmetic) backport of
      4eee627ea59304cdd66c5d4194ef13486a6c44fc]
    Signed-off-by: Quentin Schulz <quentin.schulz@cherry.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

ASoC: es8328: fix route from DAC to output [+ + +]

Author: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Date:   Sat Feb 22 20:39:57 2025 +0100

    ASoC: es8328: fix route from DAC to output
    
    [ Upstream commit 5b0c02f9b8acf2a791e531bbc09acae2d51f4f9b ]
    
    The ES8328 codec driver, which is also used for the ES8388 chip that
    appears to have an identical register map, claims that the output can
    either take the route from DAC->Mixer->Output or through DAC->Output
    directly. To the best of what I could find, this is not true, and
    creates problems.
    
    Without DACCONTROL17 bit index 7 set for the left channel, as well as
    DACCONTROL20 bit index 7 set for the right channel, I cannot get any
    analog audio out on Left Out 2 and Right Out 2 respectively, despite the
    DAPM routes claiming that this should be possible. Furthermore, the same
    is the case for Left Out 1 and Right Out 1, showing that those two don't
    have a direct route from DAC to output bypassing the mixer either.
    
    Those control bits toggle whether the DACs are fed (stale bread?) into
    their respective mixers. If one "unmutes" the mixer controls in
    alsamixer, then sure, the audio output works, but if it doesn't work
    without the mixer being fed the DAC input then evidently it's not a
    direct output from the DAC.
    
    ES8328/ES8388 are seemingly not alone in this. ES8323, which uses a
    separate driver for what appears to be a very similar register map,
    simply flips those two bits on in its probe function, and then pretends
    there is no power management whatsoever for the individual controls.
    Fair enough.
    
    My theory as to why nobody has noticed this up to this point is that
    everyone just assumes it's their fault when they had to unmute an
    additional control in ALSA.
    
    Fix this in the es8328 driver by removing the erroneous direct route,
    then get rid of the playback switch controls and have those bits tied to
    the mixer's widget instead, which until now had no register to play
    with.
    
    Fixes: 567e4f98922c ("ASoC: add es8328 codec driver")
    Signed-off-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
    Link: https://patch.msgid.link/20250222-es8328-route-bludgeoning-v1-1-99bfb7fb22d9@collabora.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Bluetooth: L2CAP: Fix L2CAP_ECRED_CONN_RSP response [+ + +]

Author: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Date:   Fri Feb 14 10:30:25 2025 -0500

    Bluetooth: L2CAP: Fix L2CAP_ECRED_CONN_RSP response
    
    [ Upstream commit b25120e1d5f2ebb3db00af557709041f47f7f3d0 ]
    
    L2CAP_ECRED_CONN_RSP needs to respond DCID in the same order received as
    SCID but the order is reversed due to use of list_add which actually
    prepend channels to the list so the response is reversed:
    
    > ACL Data RX: Handle 16 flags 0x02 dlen 26
          LE L2CAP: Enhanced Credit Connection Request (0x17) ident 2 len 18
            PSM: 39 (0x0027)
            MTU: 256
            MPS: 251
            Credits: 65535
            Source CID: 116
            Source CID: 117
            Source CID: 118
            Source CID: 119
            Source CID: 120
    < ACL Data TX: Handle 16 flags 0x00 dlen 26
          LE L2CAP: Enhanced Credit Connection Response (0x18) ident 2 len 18
            MTU: 517
            MPS: 247
            Credits: 3
            Result: Connection successful (0x0000)
            Destination CID: 68
            Destination CID: 67
            Destination CID: 66
            Destination CID: 65
            Destination CID: 64
    
    Also make sure the response don't include channels that are not on
    BT_CONNECT2 since the chan->ident can be set to the same value as in the
    following trace:
    
    < ACL Data TX: Handle 16 flags 0x00 dlen 12
          LE L2CAP: LE Flow Control Credit (0x16) ident 6 len 4
            Source CID: 64
            Credits: 1
    ...
    > ACL Data RX: Handle 16 flags 0x02 dlen 18
          LE L2CAP: Enhanced Credit Connection Request (0x17) ident 6 len 10
            PSM: 39 (0x0027)
            MTU: 517
            MPS: 251
            Credits: 255
            Source CID: 70
    < ACL Data TX: Handle 16 flags 0x00 dlen 20
          LE L2CAP: Enhanced Credit Connection Response (0x18) ident 6 len 12
            MTU: 517
            MPS: 247
            Credits: 3
            Result: Connection successful (0x0000)
            Destination CID: 64
            Destination CID: 68
    
    Closes: https://github.com/bluez/bluez/issues/1094
    Fixes: 9aa9d9473f15 ("Bluetooth: L2CAP: Fix responding with wrong PDU type")
    Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

drm/amd/display: Disable PSR-SU on eDP panels [+ + +]

Author: Tom Chung <chiahsuan.chung@amd.com>
Date:   Thu Feb 6 11:31:23 2025 +0800

    drm/amd/display: Disable PSR-SU on eDP panels
    
    commit e8863f8b0316d8ee1e7e5291e8f2f72c91ac967d upstream.
    
    [Why]
    PSR-SU may cause some glitching randomly on several panels.
    
    [How]
    Temporarily disable the PSR-SU and fallback to PSR1 for
    all eDP panels.
    
    Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3388
    Cc: Mario Limonciello <mario.limonciello@amd.com>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Reviewed-by: Sun peng Li <sunpeng.li@amd.com>
    Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
    Signed-off-by: Roman Li <roman.li@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit 6deeefb820d0efb0b36753622fb982d03b37b3ad)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

drm/amd/display: Fix HPD after gpu reset [+ + +]

Author: Roman Li <Roman.Li@amd.com>
Date:   Wed Feb 12 14:49:36 2025 -0500

    drm/amd/display: Fix HPD after gpu reset
    
    commit 4de141b8b1b7991b607f77e5f4580e1c67c24717 upstream.
    
    [Why]
    DC is not using amdgpu_irq_get/put to manage the HPD interrupt refcounts.
    So when amdgpu_irq_gpu_reset_resume_helper() reprograms all of the IRQs,
    HPD gets disabled.
    
    [How]
    Use amdgpu_irq_get/put() for HPD init/fini in DM in order to sync refcounts
    
    Cc: Mario Limonciello <mario.limonciello@amd.com>
    Cc: Alex Deucher <alexander.deucher@amd.com>
    Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
    Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
    Signed-off-by: Roman Li <Roman.Li@amd.com>
    Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
    Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
    (cherry picked from commit f3dde2ff7fcaacd77884502e8f572f2328e9c745)
    Cc: stable@vger.kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

firmware: cs_dsp: Remove async regmap writes [+ + +]

Author: Richard Fitzgerald <rf@opensource.cirrus.com>
Date:   Tue Feb 25 13:18:42 2025 +0000

    firmware: cs_dsp: Remove async regmap writes
    
    [ Upstream commit fe08b7d5085a9774abc30c26d5aebc5b9cdd6091 ]
    
    Change calls to async regmap write functions to use the normal
    blocking writes so that the cs35l56 driver can use spi_bus_lock() to
    gain exclusive access to the SPI bus.
    
    As this is part of a fix, it makes only the minimal change to swap the
    functions to the blocking equivalents. There's no need to risk
    reworking the buffer allocation logic that is now partially redundant.
    
    The async writes are a 12-year-old workaround for inefficiency of
    synchronous writes in the SPI subsystem. The SPI subsystem has since
    been changed to avoid the overheads, so this workaround should not be
    necessary.
    
    The cs35l56 driver needs to use spi_bus_lock() prevent bus activity
    while it is soft-resetting the cs35l56. But spi_bus_lock() is
    incompatible with spi_async() calls, which will fail with -EBUSY.
    
    Fixes: 8a731fd37f8b ("ASoC: cs35l56: Move utility functions to shared file")
    Signed-off-by: Richard Fitzgerald <rf@opensource.cirrus.com>
    Link: https://patch.msgid.link/20250225131843.113752-2-rf@opensource.cirrus.com
    Signed-off-by: Mark Brown <broonie@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ftrace: Avoid potential division by zero in function_stat_show() [+ + +]

Author: Nikolay Kuratov <kniv@yandex-team.ru>
Date:   Thu Feb 6 12:01:56 2025 +0300

    ftrace: Avoid potential division by zero in function_stat_show()
    
    commit a1a7eb89ca0b89dc1c326eeee2596f263291aca3 upstream.
    
    Check whether denominator expression x * (x - 1) * 1000 mod {2^32, 2^64}
    produce zero and skip stddev computation in that case.
    
    For now don't care about rec->counter * rec->counter overflow because
    rec->time * rec->time overflow will likely happen earlier.
    
    Cc: stable@vger.kernel.org
    Cc: Wen Yang <wenyang@linux.alibaba.com>
    Cc: Mark Rutland <mark.rutland@arm.com>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Link: https://lore.kernel.org/20250206090156.1561783-1-kniv@yandex-team.ru
    Fixes: e31f7939c1c27 ("ftrace: Avoid potential division by zero in function profiler")
    Signed-off-by: Nikolay Kuratov <kniv@yandex-team.ru>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

gve: set xdp redirect target only when it is available [+ + +]

Author: Joshua Washington <joshwash@google.com>
Date:   Fri Feb 14 14:43:59 2025 -0800

    gve: set xdp redirect target only when it is available
    
    commit 415cadd505464d9a11ff5e0f6e0329c127849da5 upstream.
    
    Before this patch the NETDEV_XDP_ACT_NDO_XMIT XDP feature flag is set by
    default as part of driver initialization, and is never cleared. However,
    this flag differs from others in that it is used as an indicator for
    whether the driver is ready to perform the ndo_xdp_xmit operation as
    part of an XDP_REDIRECT. Kernel helpers
    xdp_features_(set|clear)_redirect_target exist to convey this meaning.
    
    This patch ensures that the netdev is only reported as a redirect target
    when XDP queues exist to forward traffic.
    
    Fixes: 39a7f4aa3e4a ("gve: Add XDP REDIRECT support for GQI-QPL format")
    Cc: stable@vger.kernel.org
    Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com>
    Reviewed-by: Jeroen de Borst <jeroendb@google.com>
    Signed-off-by: Joshua Washington <joshwash@google.com>
    Link: https://patch.msgid.link/20250214224417.1237818-1-joshwash@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Joshua Washington <joshwash@google.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: ls2x: Fix frequency division register access [+ + +]

Author: Binbin Zhou <zhoubinbin@loongson.cn>
Date:   Thu Feb 20 20:56:12 2025 +0800

    i2c: ls2x: Fix frequency division register access
    
    commit 71c49ee9bb41e1709abac7e2eb05f9193222e580 upstream.
    
    According to the chip manual, the I2C register access type of
    Loongson-2K2000/LS7A is "B", so we can only access registers in byte
    form (readb()/writeb()).
    
    Although Loongson-2K0500/Loongson-2K1000 do not have similar
    constraints, register accesses in byte form also behave correctly.
    
    Also, in hardware, the frequency division registers are defined as two
    separate registers (high 8-bit and low 8-bit), so we just access them
    directly as bytes.
    
    Fixes: 015e61f0bffd ("i2c: ls2x: Add driver for Loongson-2K/LS7A I2C controller")
    Co-developed-by: Hongliang Wang <wanghongliang@loongson.cn>
    Signed-off-by: Hongliang Wang <wanghongliang@loongson.cn>
    Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
    Cc: stable@vger.kernel.org # v6.3+
    Reviewed-by: Andy Shevchenko <andy@kernel.org>
    Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
    Link: https://lore.kernel.org/r/20250220125612.1910990-1-zhoubinbin@loongson.cn
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

i2c: npcm: disable interrupt enable bit before devm_request_irq [+ + +]

Author: Tyrone Ting <kfting@nuvoton.com>
Date:   Thu Feb 20 12:00:29 2025 +0800

    i2c: npcm: disable interrupt enable bit before devm_request_irq
    
    commit dd1998e243f5fa25d348a384ba0b6c84d980f2b2 upstream.
    
    The customer reports that there is a soft lockup issue related to
    the i2c driver. After checking, the i2c module was doing a tx transfer
    and the bmc machine reboots in the middle of the i2c transaction, the i2c
    module keeps the status without being reset.
    
    Due to such an i2c module status, the i2c irq handler keeps getting
    triggered since the i2c irq handler is registered in the kernel booting
    process after the bmc machine is doing a warm rebooting.
    The continuous triggering is stopped by the soft lockup watchdog timer.
    
    Disable the interrupt enable bit in the i2c module before calling
    devm_request_irq to fix this issue since the i2c relative status bit
    is read-only.
    
    Here is the soft lockup log.
    [   28.176395] watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [swapper/0:1]
    [   28.183351] Modules linked in:
    [   28.186407] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.15.120-yocto-s-dirty-bbebc78 #1
    [   28.201174] pstate: 40000005 (nZcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [   28.208128] pc : __do_softirq+0xb0/0x368
    [   28.212055] lr : __do_softirq+0x70/0x368
    [   28.215972] sp : ffffff8035ebca00
    [   28.219278] x29: ffffff8035ebca00 x28: 0000000000000002 x27: ffffff80071a3780
    [   28.226412] x26: ffffffc008bdc000 x25: ffffffc008bcc640 x24: ffffffc008be50c0
    [   28.233546] x23: ffffffc00800200c x22: 0000000000000000 x21: 000000000000001b
    [   28.240679] x20: 0000000000000000 x19: ffffff80001c3200 x18: ffffffffffffffff
    [   28.247812] x17: ffffffc02d2e0000 x16: ffffff8035eb8b40 x15: 00001e8480000000
    [   28.254945] x14: 02c3647e37dbfcb6 x13: 02c364f2ab14200c x12: 0000000002c364f2
    [   28.262078] x11: 00000000fa83b2da x10: 000000000000b67e x9 : ffffffc008010250
    [   28.269211] x8 : 000000009d983d00 x7 : 7fffffffffffffff x6 : 0000036d74732434
    [   28.276344] x5 : 00ffffffffffffff x4 : 0000000000000015 x3 : 0000000000000198
    [   28.283476] x2 : ffffffc02d2e0000 x1 : 00000000000000e0 x0 : ffffffc008bdcb40
    [   28.290611] Call trace:
    [   28.293052]  __do_softirq+0xb0/0x368
    [   28.296625]  __irq_exit_rcu+0xe0/0x100
    [   28.300374]  irq_exit+0x14/0x20
    [   28.303513]  handle_domain_irq+0x68/0x90
    [   28.307440]  gic_handle_irq+0x78/0xb0
    [   28.311098]  call_on_irq_stack+0x20/0x38
    [   28.315019]  do_interrupt_handler+0x54/0x5c
    [   28.319199]  el1_interrupt+0x2c/0x4c
    [   28.322777]  el1h_64_irq_handler+0x14/0x20
    [   28.326872]  el1h_64_irq+0x74/0x78
    [   28.330269]  __setup_irq+0x454/0x780
    [   28.333841]  request_threaded_irq+0xd0/0x1b4
    [   28.338107]  devm_request_threaded_irq+0x84/0x100
    [   28.342809]  npcm_i2c_probe_bus+0x188/0x3d0
    [   28.346990]  platform_probe+0x6c/0xc4
    [   28.350653]  really_probe+0xcc/0x45c
    [   28.354227]  __driver_probe_device+0x8c/0x160
    [   28.358578]  driver_probe_device+0x44/0xe0
    [   28.362670]  __driver_attach+0x124/0x1d0
    [   28.366589]  bus_for_each_dev+0x7c/0xe0
    [   28.370426]  driver_attach+0x28/0x30
    [   28.373997]  bus_add_driver+0x124/0x240
    [   28.377830]  driver_register+0x7c/0x124
    [   28.381662]  __platform_driver_register+0x2c/0x34
    [   28.386362]  npcm_i2c_init+0x3c/0x5c
    [   28.389937]  do_one_initcall+0x74/0x230
    [   28.393768]  kernel_init_freeable+0x24c/0x2b4
    [   28.398126]  kernel_init+0x28/0x130
    [   28.401614]  ret_from_fork+0x10/0x20
    [   28.405189] Kernel panic - not syncing: softlockup: hung tasks
    [   28.411011] SMP: stopping secondary CPUs
    [   28.414933] Kernel Offset: disabled
    [   28.418412] CPU features: 0x00000000,00000802
    [   28.427644] Rebooting in 20 seconds..
    
    Fixes: 56a1485b102e ("i2c: npcm7xx: Add Nuvoton NPCM I2C controller driver")
    Signed-off-by: Tyrone Ting <kfting@nuvoton.com>
    Cc: <stable@vger.kernel.org> # v5.8+
    Reviewed-by: Tali Perry <tali.perry1@gmail.com>
    Signed-off-by: Andi Shyti <andi.shyti@kernel.org>
    Link: https://lore.kernel.org/r/20250220040029.27596-2-kfting@nuvoton.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

IB/core: Add support for XDR link speed [+ + +]

Author: Or Har-Toov <ohartoov@nvidia.com>
Date:   Wed Sep 20 13:07:40 2023 +0300

    IB/core: Add support for XDR link speed
    
    [ Upstream commit 703289ce43f740b0096724300107df82d008552f ]
    
    Add new IBTA speed XDR, the new rate that was added to Infiniband spec
    as part of XDR and supporting signaling rate of 200Gb.
    
    In order to report that value to rdma-core, add new u32 field to
    query_port response.
    
    Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
    Reviewed-by: Mark Zhang <markzhang@nvidia.com>
    Link: https://lore.kernel.org/r/9d235fc600a999e8274010f0e18b40fa60540e6c.1695204156.git.leon@kernel.org
    Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Stable-dep-of: c534ffda781f ("RDMA/mlx5: Fix AH static rate parsing")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

IB/mlx5: Set and get correct qp_num for a DCT QP [+ + +]

Author: Mark Zhang <markzhang@nvidia.com>
Date:   Sun Jan 19 14:39:46 2025 +0200

    IB/mlx5: Set and get correct qp_num for a DCT QP
    
    [ Upstream commit 12d044770e12c4205fa69535b4fa8a9981fea98f ]
    
    When a DCT QP is created on an active lag, it's dctc.port is assigned
    in a round-robin way, which is from 1 to dev->lag_port. In this case
    when querying this QP, we may get qp_attr.port_num > 2.
    Fix this by setting qp->port when modifying a DCT QP, and read port_num
    from qp->port instead of dctc.port when querying it.
    
    Fixes: 7c4b1ab9f167 ("IB/mlx5: Add DCT RoCE LAG support")
    Signed-off-by: Mark Zhang <markzhang@nvidia.com>
    Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
    Link: https://patch.msgid.link/94c76bf0adbea997f87ffa27674e0a7118ad92a9.1737290358.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: Add E830 device IDs, MAC type and registers [+ + +]

Author: Paul Greenwalt <paul.greenwalt@intel.com>
Date:   Wed Oct 25 14:41:52 2023 -0700

    ice: Add E830 device IDs, MAC type and registers
    
    [ Upstream commit ba1124f58afd37d9ff155d4ab7c9f209346aaed9 ]
    
    E830 is the 200G NIC family which uses the ice driver.
    
    Add specific E830 registers. Embed macros to use proper register based on
    (hw)->mac_type & name those macros to [ORIGINAL]_BY_MAC(hw). Registers
    only available on one of the macs will need to be explicitly referred to
    as E800_NAME instead of just NAME. PTP is not yet supported.
    
    Co-developed-by: Milena Olech <milena.olech@intel.com>
    Signed-off-by: Milena Olech <milena.olech@intel.com>
    Co-developed-by: Dan Nowlin <dan.nowlin@intel.com>
    Signed-off-by: Dan Nowlin <dan.nowlin@intel.com>
    Co-developed-by: Scott Taylor <scott.w.taylor@intel.com>
    Signed-off-by: Scott Taylor <scott.w.taylor@intel.com>
    Co-developed-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
    Signed-off-by: Pawel Chmielewski <pawel.chmielewski@intel.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
    Tested-by: Tony Brelinski <tony.brelinski@intel.com>
    Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
    Link: https://lore.kernel.org/r/20231025214157.1222758-2-jacob.e.keller@intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: 79990cf5e7ad ("ice: Fix deinitializing VF in error path")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: add E830 HW VF mailbox message limit support [+ + +]

Author: Paul Greenwalt <paul.greenwalt@intel.com>
Date:   Tue Aug 20 17:26:16 2024 -0400

    ice: add E830 HW VF mailbox message limit support
    
    [ Upstream commit 59f4d59b25aec39a015c0949f4ec235c7a839c44 ]
    
    E830 adds hardware support to prevent the VF from overflowing the PF
    mailbox with VIRTCHNL messages. E830 will use the hardware feature
    (ICE_F_MBX_LIMIT) instead of the software solution ice_is_malicious_vf().
    
    To prevent a VF from overflowing the PF, the PF sets the number of
    messages per VF that can be in the PF's mailbox queue
    (ICE_MBX_OVERFLOW_WATERMARK). When the PF processes a message from a VF,
    the PF decrements the per VF message count using the E830_MBX_VF_DEC_TRIG
    register.
    
    Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
    Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>
    Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Stable-dep-of: 79990cf5e7ad ("ice: Fix deinitializing VF in error path")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ice: Fix deinitializing VF in error path [+ + +]

Author: Marcin Szycik <marcin.szycik@linux.intel.com>
Date:   Mon Feb 24 11:06:41 2025 -0800

    ice: Fix deinitializing VF in error path
    
    [ Upstream commit 79990cf5e7aded76d0c092c9f5ed31eb1c75e02c ]
    
    If ice_ena_vfs() fails after calling ice_create_vf_entries(), it frees
    all VFs without removing them from snapshot PF-VF mailbox list, leading
    to list corruption.
    
    Reproducer:
      devlink dev eswitch set $PF1_PCI mode switchdev
      ip l s $PF1 up
      ip l s $PF1 promisc on
      sleep 1
      echo 1 > /sys/class/net/$PF1/device/sriov_numvfs
      sleep 1
      echo 1 > /sys/class/net/$PF1/device/sriov_numvfs
    
    Trace (minimized):
      list_add corruption. next->prev should be prev (ffff8882e241c6f0), but was 0000000000000000. (next=ffff888455da1330).
      kernel BUG at lib/list_debug.c:29!
      RIP: 0010:__list_add_valid_or_report+0xa6/0x100
       ice_mbx_init_vf_info+0xa7/0x180 [ice]
       ice_initialize_vf_entry+0x1fa/0x250 [ice]
       ice_sriov_configure+0x8d7/0x1520 [ice]
       ? __percpu_ref_switch_mode+0x1b1/0x5d0
       ? __pfx_ice_sriov_configure+0x10/0x10 [ice]
    
    Sometimes a KASAN report can be seen instead with a similar stack trace:
      BUG: KASAN: use-after-free in __list_add_valid_or_report+0xf1/0x100
    
    VFs are added to this list in ice_mbx_init_vf_info(), but only removed
    in ice_free_vfs(). Move the removing to ice_free_vf_entries(), which is
    also being called in other places where VFs are being removed (including
    ice_free_vfs() itself).
    
    Fixes: 8cd8a6b17d27 ("ice: move VF overflow message count into struct ice_mbx_vf_info")
    Reported-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
    Closes: https://lore.kernel.org/intel-wired-lan/PH0PR11MB50138B635F2E5CEB7075325D961F2@PH0PR11MB5013.namprd11.prod.outlook.com
    Reviewed-by: Martyna Szapar-Mudlaw <martyna.szapar-mudlaw@linux.intel.com>
    Signed-off-by: Marcin Szycik <marcin.szycik@linux.intel.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
    Link: https://patch.msgid.link/20250224190647.3601930-2-anthony.l.nguyen@intel.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

include: net: add static inline dst_dev_overhead() to dst.h [+ + +]

Author: Justin Iurman <justin.iurman@uliege.be>
Date:   Tue Dec 3 13:49:42 2024 +0100

    include: net: add static inline dst_dev_overhead() to dst.h
    
    [ Upstream commit 0600cf40e9b36fe17f9c9f04d4f9cef249eaa5e7 ]
    
    Add static inline dst_dev_overhead() function to include/net/dst.h. This
    helper function is used by ioam6_iptunnel, rpl_iptunnel and
    seg6_iptunnel to get the dev's overhead based on a cache entry
    (dst_entry). If the cache is empty, the default and generic value
    skb->mac_len is returned. Otherwise, LL_RESERVED_SPACE() over dst's dev
    is returned.
    
    Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
    Cc: Alexander Lobakin <aleksander.lobakin@intel.com>
    Cc: Vadim Fedorenko <vadim.fedorenko@linux.dev>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Stable-dep-of: c64a0727f9b1 ("net: ipv6: fix dst ref loop on input in seg6 lwt")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

intel_idle: Handle older CPUs, which stop the TSC in deeper C states, correctly [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Feb 25 23:37:08 2025 +0100

    intel_idle: Handle older CPUs, which stop the TSC in deeper C states, correctly
    
    commit c157d351460bcf202970e97e611cb6b54a3dd4a4 upstream.
    
    The Intel idle driver is preferred over the ACPI processor idle driver,
    but fails to implement the work around for Core2 generation CPUs, where
    the TSC stops in C2 and deeper C-states. This causes stalls and boot
    delays, when the clocksource watchdog does not catch the unstable TSC
    before the CPU goes deep idle for the first time.
    
    The ACPI driver marks the TSC unstable when it detects that the CPU
    supports C2 or deeper and the CPU does not have a non-stop TSC.
    
    Add the equivivalent work around to the Intel idle driver to cure that.
    
    Fixes: 18734958e9bf ("intel_idle: Use ACPI _CST for processor models without C-state tables")
    Reported-by: Fab Stz <fabstz-it@yahoo.fr>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Fab Stz <fabstz-it@yahoo.fr>
    Cc: All applicable <stable@vger.kernel.org>
    Closes: https://lore.kernel.org/all/10cf96aa-1276-4bd4-8966-c890377030c3@yahoo.fr
    Link: https://patch.msgid.link/87bjupfy7f.ffs@tglx
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

io_uring/net: save msg_control for compat [+ + +]

Author: Pavel Begunkov <asml.silence@gmail.com>
Date:   Tue Feb 25 15:59:02 2025 +0000

    io_uring/net: save msg_control for compat
    
    [ Upstream commit 6ebf05189dfc6d0d597c99a6448a4d1064439a18 ]
    
    Match the compat part of io_sendmsg_copy_hdr() with its counterpart and
    save msg_control.
    
    Fixes: c55978024d123 ("io_uring/net: move receive multishot out of the generic msghdr path")
    Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
    Link: https://lore.kernel.org/r/2a8418821fe83d3b64350ad2b3c0303e9b732bbd.1740498502.git.asml.silence@gmail.com
    Signed-off-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv4: Convert icmp_route_lookup() to dscp_t. [+ + +]

Author: Guillaume Nault <gnault@redhat.com>
Date:   Tue Oct 1 21:28:37 2024 +0200

    ipv4: Convert icmp_route_lookup() to dscp_t.
    
    [ Upstream commit 913c83a610bb7dd8e5952a2b4663e1feec0b5de6 ]
    
    Pass a dscp_t variable to icmp_route_lookup(), instead of a plain u8,
    to prevent accidental setting of ECN bits in ->flowi4_tos. Rename that
    variable ("tos" -> "dscp") to make the intent clear.
    
    While there, reorganise the function parameters to fill up horizontal
    space.
    
    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://patch.msgid.link/294fead85c6035bcdc5fcf9a6bb4ce8798c45ba1.1727807926.git.gnault@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: 27843ce6ba3d ("ipvlan: ensure network headers are in skb linear part")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv4: Convert ip_route_input() to dscp_t. [+ + +]

Author: Guillaume Nault <gnault@redhat.com>
Date:   Tue Oct 1 21:28:43 2024 +0200

    ipv4: Convert ip_route_input() to dscp_t.
    
    [ Upstream commit 7e863e5db6185b1add0df4cb01b31a4ed1c4b738 ]
    
    Pass a dscp_t variable to ip_route_input(), instead of a plain u8, to
    prevent accidental setting of ECN bits in ->flowi4_tos.
    
    Callers of ip_route_input() to consider are:
    
      * input_action_end_dx4_finish() and input_action_end_dt4() in
        net/ipv6/seg6_local.c. These functions set the tos parameter to 0,
        which is already a valid dscp_t value, so they don't need to be
        adjusted for the new prototype.
    
      * icmp_route_lookup(), which already has a dscp_t variable to pass as
        parameter. We just need to remove the inet_dscp_to_dsfield()
        conversion.
    
      * br_nf_pre_routing_finish(), ip_options_rcv_srr() and ip4ip6_err(),
        which get the DSCP directly from IPv4 headers. Define a helper to
        read the .tos field of struct iphdr as dscp_t, so that these
        function don't have to do the conversion manually.
    
    While there, declare *iph as const in br_nf_pre_routing_finish(),
    declare its local variables in reverse-christmas-tree order and move
    the "err = ip_route_input()" assignment out of the conditional to avoid
    checkpatch warning.
    
    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://patch.msgid.link/e9d40781d64d3d69f4c79ac8a008b8d67a033e8d.1727807926.git.gnault@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: 27843ce6ba3d ("ipvlan: ensure network headers are in skb linear part")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv4: icmp: Pass full DS field to ip_route_input() [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Wed Aug 21 15:52:49 2024 +0300

    ipv4: icmp: Pass full DS field to ip_route_input()
    
    [ Upstream commit 1c6f50b37f711b831d78973dad0df1da99ad0014 ]
    
    Align the ICMP code to other callers of ip_route_input() and pass the
    full DS field. In the future this will allow us to perform a route
    lookup according to the full DSCP value.
    
    No functional changes intended since the upper DSCP bits are masked when
    comparing against the TOS selectors in FIB rules and routes.
    
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Guillaume Nault <gnault@redhat.com>
    Acked-by: Florian Westphal <fw@strlen.de>
    Reviewed-by: David Ahern <dsahern@kernel.org>
    Link: https://patch.msgid.link/20240821125251.1571445-11-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: 27843ce6ba3d ("ipvlan: ensure network headers are in skb linear part")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipv4: icmp: Unmask upper DSCP bits in icmp_route_lookup() [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Thu Aug 29 09:54:50 2024 +0300

    ipv4: icmp: Unmask upper DSCP bits in icmp_route_lookup()
    
    [ Upstream commit 4805646c42e51d2fbf142864d281473ad453ad5d ]
    
    The function is called to resolve a route for an ICMP message that is
    sent in response to a situation. Based on the type of the generated ICMP
    message, the function is either passed the DS field of the packet that
    generated the ICMP message or a DS field that is derived from it.
    
    Unmask the upper DSCP bits before resolving and output route via
    ip_route_output_key_hash() so that in the future the lookup could be
    performed according to the full DSCP value.
    
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Guillaume Nault <gnault@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: 27843ce6ba3d ("ipvlan: ensure network headers are in skb linear part")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipvlan: ensure network headers are in skb linear part [+ + +]

Author: Eric Dumazet <edumazet@google.com>
Date:   Thu Feb 20 15:53:36 2025 +0000

    ipvlan: ensure network headers are in skb linear part
    
    [ Upstream commit 27843ce6ba3d3122b65066550fe33fb8839f8aef ]
    
    syzbot found that ipvlan_process_v6_outbound() was assuming
    the IPv6 network header isis present in skb->head [1]
    
    Add the needed pskb_network_may_pull() calls for both
    IPv4 and IPv6 handlers.
    
    [1]
    BUG: KMSAN: uninit-value in __ipv6_addr_type+0xa2/0x490 net/ipv6/addrconf_core.c:47
      __ipv6_addr_type+0xa2/0x490 net/ipv6/addrconf_core.c:47
      ipv6_addr_type include/net/ipv6.h:555 [inline]
      ip6_route_output_flags_noref net/ipv6/route.c:2616 [inline]
      ip6_route_output_flags+0x51/0x720 net/ipv6/route.c:2651
      ip6_route_output include/net/ip6_route.h:93 [inline]
      ipvlan_route_v6_outbound+0x24e/0x520 drivers/net/ipvlan/ipvlan_core.c:476
      ipvlan_process_v6_outbound drivers/net/ipvlan/ipvlan_core.c:491 [inline]
      ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:541 [inline]
      ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:605 [inline]
      ipvlan_queue_xmit+0xd72/0x1780 drivers/net/ipvlan/ipvlan_core.c:671
      ipvlan_start_xmit+0x5b/0x210 drivers/net/ipvlan/ipvlan_main.c:223
      __netdev_start_xmit include/linux/netdevice.h:5150 [inline]
      netdev_start_xmit include/linux/netdevice.h:5159 [inline]
      xmit_one net/core/dev.c:3735 [inline]
      dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3751
      sch_direct_xmit+0x399/0xd40 net/sched/sch_generic.c:343
      qdisc_restart net/sched/sch_generic.c:408 [inline]
      __qdisc_run+0x14da/0x35d0 net/sched/sch_generic.c:416
      qdisc_run+0x141/0x4d0 include/net/pkt_sched.h:127
      net_tx_action+0x78b/0x940 net/core/dev.c:5484
      handle_softirqs+0x1a0/0x7c0 kernel/softirq.c:561
      __do_softirq+0x14/0x1a kernel/softirq.c:595
      do_softirq+0x9a/0x100 kernel/softirq.c:462
      __local_bh_enable_ip+0x9f/0xb0 kernel/softirq.c:389
      local_bh_enable include/linux/bottom_half.h:33 [inline]
      rcu_read_unlock_bh include/linux/rcupdate.h:919 [inline]
      __dev_queue_xmit+0x2758/0x57d0 net/core/dev.c:4611
      dev_queue_xmit include/linux/netdevice.h:3311 [inline]
      packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276
      packet_snd net/packet/af_packet.c:3132 [inline]
      packet_sendmsg+0x93e0/0xa7e0 net/packet/af_packet.c:3164
      sock_sendmsg_nosec net/socket.c:718 [inline]
    
    Fixes: 2ad7bf363841 ("ipvlan: Initial check-in of the IPVLAN driver.")
    Reported-by: syzbot+93ab4a777bafb9d9f960@syzkaller.appspotmail.com
    Closes: https://lore.kernel.org/netdev/67b74f01.050a0220.14d86d.02d8.GAE@google.com/T/#u
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Cc: Mahesh Bandewar <maheshb@google.com>
    Link: https://patch.msgid.link/20250220155336.61884-1-edumazet@google.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipvlan: Prepare ipvlan_process_v4_outbound() to future .flowi4_tos conversion. [+ + +]

Author: Guillaume Nault <gnault@redhat.com>
Date:   Wed Oct 30 13:43:11 2024 +0100

    ipvlan: Prepare ipvlan_process_v4_outbound() to future .flowi4_tos conversion.
    
    [ Upstream commit 0c30d6eedd1ec0c1382bcab9576d26413cd278a3 ]
    
    Use ip4h_dscp() to get the DSCP from the IPv4 header, then convert the
    dscp_t value to __u8 with inet_dscp_to_dsfield().
    
    Then, when we'll convert .flowi4_tos to dscp_t, we'll just have to drop
    the inet_dscp_to_dsfield() call.
    
    Signed-off-by: Guillaume Nault <gnault@redhat.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Link: https://patch.msgid.link/f48335504a05b3587e0081a9b4511e0761571ca5.1730292157.git.gnault@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Stable-dep-of: 27843ce6ba3d ("ipvlan: ensure network headers are in skb linear part")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipvlan: Unmask upper DSCP bits in ipvlan_process_v4_outbound() [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Thu Aug 29 09:54:57 2024 +0300

    ipvlan: Unmask upper DSCP bits in ipvlan_process_v4_outbound()
    
    [ Upstream commit 939cd1abf080c629552a9c5e6db4c0509d13e4c7 ]
    
    Unmask the upper DSCP bits when calling ip_route_output_flow() so that
    in the future it could perform the FIB lookup according to the full DSCP
    value.
    
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Guillaume Nault <gnault@redhat.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: 27843ce6ba3d ("ipvlan: ensure network headers are in skb linear part")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ipvs: Always clear ipvs_property flag in skb_scrub_packet() [+ + +]

Author: Philo Lu <lulie@linux.alibaba.com>
Date:   Sat Feb 22 11:35:18 2025 +0800

    ipvs: Always clear ipvs_property flag in skb_scrub_packet()
    
    [ Upstream commit de2c211868b9424f9aa9b3432c4430825bafb41b ]
    
    We found an issue when using bpf_redirect with ipvs NAT mode after
    commit ff70202b2d1a ("dev_forward_skb: do not scrub skb mark within
    the same name space"). Particularly, we use bpf_redirect to return
    the skb directly back to the netif it comes from, i.e., xnet is
    false in skb_scrub_packet(), and then ipvs_property is preserved
    and SNAT is skipped in the rx path.
    
    ipvs_property has been already cleared when netns is changed in
    commit 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when
    SKB net namespace changed"). This patch just clears it in spite of
    netns.
    
    Fixes: 2b5ec1a5f973 ("netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed")
    Signed-off-by: Philo Lu <lulie@linux.alibaba.com>
    Acked-by: Julian Anastasov <ja@ssi.bg>
    Link: https://patch.msgid.link/20250222033518.126087-1-lulie@linux.alibaba.com
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Linux: Linux 6.6.81 [+ + +]

Author: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date:   Fri Mar 7 16:45:57 2025 +0100

    Linux 6.6.81
    
    Link: https://lore.kernel.org/r/20250305174500.327985489@linuxfoundation.org
    Tested-by: Pavel Machek (CIP) <pavel@denx.de>
    Tested-by: SeongJae Park <sj@kernel.org>
    Tested-by: Peter Schneider <pschneider1968@googlemail.com>
    Tested-by: Ron Economos <re@w6rz.net>
    Tested-by: Mark Brown <broonie@kernel.org>
    Tested-by: Shuah Khan <skhan@linuxfoundation.org>
    Tested-by: Hardik Garg <hargar@linux.microsoft.com>
    Link: https://lore.kernel.org/r/20250306151412.957725234@linuxfoundation.org
    Tested-by: Pavel Machek (CIP) <pavel@denx.de>
    Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
    Tested-by: Mark Brown <broonie@kernel.org>
    Tested-by: Peter Schneider <pschneider1968@googlemail.com>
    Tested-by: Jon Hunter <jonathanh@nvidia.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: always handle address removal under msk socket lock [+ + +]

Author: Paolo Abeni <pabeni@redhat.com>
Date:   Mon Feb 24 19:11:50 2025 +0100

    mptcp: always handle address removal under msk socket lock
    
    commit f865c24bc55158313d5779fc81116023a6940ca3 upstream.
    
    Syzkaller reported a lockdep splat in the PM control path:
    
      WARNING: CPU: 0 PID: 6693 at ./include/net/sock.h:1711 sock_owned_by_me include/net/sock.h:1711 [inline]
      WARNING: CPU: 0 PID: 6693 at ./include/net/sock.h:1711 msk_owned_by_me net/mptcp/protocol.h:363 [inline]
      WARNING: CPU: 0 PID: 6693 at ./include/net/sock.h:1711 mptcp_pm_nl_addr_send_ack+0x57c/0x610 net/mptcp/pm_netlink.c:788
      Modules linked in:
      CPU: 0 UID: 0 PID: 6693 Comm: syz.0.205 Not tainted 6.14.0-rc2-syzkaller-00303-gad1b832bf1cf #0
      Hardware name: Google Compute Engine/Google Compute Engine, BIOS Google 12/27/2024
      RIP: 0010:sock_owned_by_me include/net/sock.h:1711 [inline]
      RIP: 0010:msk_owned_by_me net/mptcp/protocol.h:363 [inline]
      RIP: 0010:mptcp_pm_nl_addr_send_ack+0x57c/0x610 net/mptcp/pm_netlink.c:788
      Code: 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ca 7b d3 f5 eb b9 e8 c3 7b d3 f5 90 0f 0b 90 e9 dd fb ff ff e8 b5 7b d3 f5 90 <0f> 0b 90 e9 3e fb ff ff 44 89 f1 80 e1 07 38 c1 0f 8c eb fb ff ff
      RSP: 0000:ffffc900034f6f60 EFLAGS: 00010283
      RAX: ffffffff8bee3c2b RBX: 0000000000000001 RCX: 0000000000080000
      RDX: ffffc90004d42000 RSI: 000000000000a407 RDI: 000000000000a408
      RBP: ffffc900034f7030 R08: ffffffff8bee37f6 R09: 0100000000000000
      R10: dffffc0000000000 R11: ffffed100bcc62e4 R12: ffff88805e6316e0
      R13: ffff88805e630c00 R14: dffffc0000000000 R15: ffff88805e630c00
      FS:  00007f7e9a7e96c0(0000) GS:ffff8880b8600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b2fd18ff8 CR3: 0000000032c24000 CR4: 00000000003526f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      Call Trace:
       <TASK>
       mptcp_pm_remove_addr+0x103/0x1d0 net/mptcp/pm.c:59
       mptcp_pm_remove_anno_addr+0x1f4/0x2f0 net/mptcp/pm_netlink.c:1486
       mptcp_nl_remove_subflow_and_signal_addr net/mptcp/pm_netlink.c:1518 [inline]
       mptcp_pm_nl_del_addr_doit+0x118d/0x1af0 net/mptcp/pm_netlink.c:1629
       genl_family_rcv_msg_doit net/netlink/genetlink.c:1115 [inline]
       genl_family_rcv_msg net/netlink/genetlink.c:1195 [inline]
       genl_rcv_msg+0xb1f/0xec0 net/netlink/genetlink.c:1210
       netlink_rcv_skb+0x206/0x480 net/netlink/af_netlink.c:2543
       genl_rcv+0x28/0x40 net/netlink/genetlink.c:1219
       netlink_unicast_kernel net/netlink/af_netlink.c:1322 [inline]
       netlink_unicast+0x7f6/0x990 net/netlink/af_netlink.c:1348
       netlink_sendmsg+0x8de/0xcb0 net/netlink/af_netlink.c:1892
       sock_sendmsg_nosec net/socket.c:718 [inline]
       __sock_sendmsg+0x221/0x270 net/socket.c:733
       ____sys_sendmsg+0x53a/0x860 net/socket.c:2573
       ___sys_sendmsg net/socket.c:2627 [inline]
       __sys_sendmsg+0x269/0x350 net/socket.c:2659
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      RIP: 0033:0x7f7e9998cde9
      Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f7e9a7e9038 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f7e99ba5fa0 RCX: 00007f7e9998cde9
      RDX: 000000002000c094 RSI: 0000400000000000 RDI: 0000000000000007
      RBP: 00007f7e99a0e2a0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000000 R14: 00007f7e99ba5fa0 R15: 00007fff49231088
    
    Indeed the PM can try to send a RM_ADDR over a msk without acquiring
    first the msk socket lock.
    
    The bugged code-path comes from an early optimization: when there
    are no subflows, the PM should (usually) not send RM_ADDR
    notifications.
    
    The above statement is incorrect, as without locks another process
    could concurrent create a new subflow and cause the RM_ADDR generation.
    
    Additionally the supposed optimization is not very effective even
    performance-wise, as most mptcp sockets should have at least one
    subflow: the MPC one.
    
    Address the issue removing the buggy code path, the existing "slow-path"
    will handle correctly even the edge case.
    
    Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink")
    Cc: stable@vger.kernel.org
    Reported-by: syzbot+cd3ce3d03a3393ae9700@syzkaller.appspotmail.com
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/546
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Link: https://patch.msgid.link/20250224-net-mptcp-misc-fixes-v1-1-f550f636b435@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

mptcp: reset when MPTCP opts are dropped after join [+ + +]

Author: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Date:   Mon Feb 24 19:11:51 2025 +0100

    mptcp: reset when MPTCP opts are dropped after join
    
    commit 8668860b0ad32a13fcd6c94a0995b7aa7638c9ef upstream.
    
    Before this patch, if the checksum was not used, the subflow was only
    reset if map_data_len was != 0. If there were no MPTCP options or an
    invalid mapping, map_data_len was not set to the data len, and then the
    subflow was not reset as it should have been, leaving the MPTCP
    connection in a wrong fallback mode.
    
    This map_data_len condition has been introduced to handle the reception
    of the infinite mapping. Instead, a new dedicated mapping error could
    have been returned and treated as a special case. However, the commit
    31bf11de146c ("mptcp: introduce MAPPING_BAD_CSUM") has been introduced
    by Paolo Abeni soon after, and backported later on to stable. It better
    handle the csum case, and it means the exception for valid_csum_seen in
    subflow_can_fallback(), plus this one for the infinite mapping in
    subflow_check_data_avail(), are no longer needed.
    
    In other words, the code can be simplified there: a fallback should only
    be done if msk->allow_infinite_fallback is set. This boolean is set to
    false once MPTCP-specific operations acting on the whole MPTCP
    connection vs the initial path have been done, e.g. a second path has
    been created, or an MPTCP re-injection -- yes, possible even with a
    single subflow. The subflow_can_fallback() helper can then be dropped,
    and replaced by this single condition.
    
    This also makes the code clearer: a fallback should only be done if it
    is possible to do so.
    
    While at it, no need to set map_data_len to 0 in get_mapping_status()
    for the infinite mapping case: it will be set to skb->len just after, at
    the end of subflow_check_data_avail(), and not read in between.
    
    Fixes: f8d4bcacff3b ("mptcp: infinite mapping receiving")
    Cc: stable@vger.kernel.org
    Reported-by: Chester A. Unal <chester.a.unal@xpedite-tech.com>
    Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/544
    Acked-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
    Tested-by: Chester A. Unal <chester.a.unal@xpedite-tech.com>
    Link: https://patch.msgid.link/20250224-net-mptcp-misc-fixes-v1-2-f550f636b435@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net/ipv4: add tracepoint for icmp_send [+ + +]

Author: Peilin He <he.peilin@zte.com.cn>
Date:   Tue May 7 15:41:03 2024 +0800

    net/ipv4: add tracepoint for icmp_send
    
    [ Upstream commit db3efdcf70c752e8a8deb16071d8e693c3ef8746 ]
    
    Introduce a tracepoint for icmp_send, which can help users to get more
    detail information conveniently when icmp abnormal events happen.
    
    1. Giving an usecase example:
    =============================
    When an application experiences packet loss due to an unreachable UDP
    destination port, the kernel will send an exception message through the
    icmp_send function. By adding a trace point for icmp_send, developers or
    system administrators can obtain detailed information about the UDP
    packet loss, including the type, code, source address, destination address,
    source port, and destination port. This facilitates the trouble-shooting
    of UDP packet loss issues especially for those network-service
    applications.
    
    2. Operation Instructions:
    ==========================
    Switch to the tracing directory.
            cd /sys/kernel/tracing
    Filter for destination port unreachable.
            echo "type==3 && code==3" > events/icmp/icmp_send/filter
    Enable trace event.
            echo 1 > events/icmp/icmp_send/enable
    
    3. Result View:
    ================
     udp_client_erro-11370   [002] ...s.12   124.728002:
     icmp_send: icmp_send: type=3, code=3.
     From 127.0.0.1:41895 to 127.0.0.1:6666 ulen=23
     skbaddr=00000000589b167a
    
    Signed-off-by: Peilin He <he.peilin@zte.com.cn>
    Signed-off-by: xu xin <xu.xin16@zte.com.cn>
    Reviewed-by: Yunkai Zhang <zhang.yunkai@zte.com.cn>
    Cc: Yang Yang <yang.yang29@zte.com.cn>
    Cc: Liu Chun <liu.chun2@zte.com.cn>
    Cc: Xuexin Jiang <jiang.xuexin@zte.com.cn>
    Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: 27843ce6ba3d ("ipvlan: ensure network headers are in skb linear part")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net/mlx5: IRQ, Fix null string in debug print [+ + +]

Author: Shay Drory <shayd@nvidia.com>
Date:   Tue Feb 25 09:26:08 2025 +0200

    net/mlx5: IRQ, Fix null string in debug print
    
    [ Upstream commit 2f5a6014eb168a97b24153adccfa663d3b282767 ]
    
    irq_pool_alloc() debug print can print a null string.
    Fix it by providing a default string to print.
    
    Fixes: 71e084e26414 ("net/mlx5: Allocating a pool of MSI-X vectors for SFs")
    Signed-off-by: Shay Drory <shayd@nvidia.com>
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202501141055.SwfIphN0-lkp@intel.com/
    Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
    Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
    Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
    Link: https://patch.msgid.link/20250225072608.526866-4-tariqt@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: cadence: macb: Synchronize stats calculations [+ + +]

Author: Sean Anderson <sean.anderson@linux.dev>
Date:   Thu Feb 20 11:29:50 2025 -0500

    net: cadence: macb: Synchronize stats calculations
    
    [ Upstream commit fa52f15c745ce55261b92873676f64f7348cfe82 ]
    
    Stats calculations involve a RMW to add the stat update to the existing
    value. This is currently not protected by any synchronization mechanism,
    so data races are possible. Add a spinlock to protect the update. The
    reader side could be protected using u64_stats, but we would still need
    a spinlock for the update side anyway. And we always do an update
    immediately before reading the stats anyway.
    
    Fixes: 89e5785fc8a6 ("[PATCH] Atmel MACB ethernet driver")
    Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
    Link: https://patch.msgid.link/20250220162950.95941-1-sean.anderson@linux.dev
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: Clear old fragment checksum value in napi_reuse_skb [+ + +]

Author: Mohammad Heib <mheib@redhat.com>
Date:   Tue Feb 25 13:28:52 2025 +0200

    net: Clear old fragment checksum value in napi_reuse_skb
    
    [ Upstream commit 49806fe6e61b045b5be8610e08b5a3083c109aa0 ]
    
    In certain cases, napi_get_frags() returns an skb that points to an old
    received fragment, This skb may have its skb->ip_summed, csum, and other
    fields set from previous fragment handling.
    
    Some network drivers set skb->ip_summed to either CHECKSUM_COMPLETE or
    CHECKSUM_UNNECESSARY when getting skb from napi_get_frags(), while
    others only set skb->ip_summed when RX checksum offload is enabled on
    the device, and do not set any value for skb->ip_summed when hardware
    checksum offload is disabled, assuming that the skb->ip_summed
    initiated to zero by napi_reuse_skb, ionic driver for example will
    ignore/unset any value for the ip_summed filed if HW checksum offload is
    disabled, and if we have a situation where the user disables the
    checksum offload during a traffic that could lead to the following
    errors shown in the kernel logs:
    <IRQ>
    dump_stack_lvl+0x34/0x48
     __skb_gro_checksum_complete+0x7e/0x90
    tcp6_gro_receive+0xc6/0x190
    ipv6_gro_receive+0x1ec/0x430
    dev_gro_receive+0x188/0x360
    ? ionic_rx_clean+0x25a/0x460 [ionic]
    napi_gro_frags+0x13c/0x300
    ? __pfx_ionic_rx_service+0x10/0x10 [ionic]
    ionic_rx_service+0x67/0x80 [ionic]
    ionic_cq_service+0x58/0x90 [ionic]
    ionic_txrx_napi+0x64/0x1b0 [ionic]
     __napi_poll+0x27/0x170
    net_rx_action+0x29c/0x370
    handle_softirqs+0xce/0x270
    __irq_exit_rcu+0xa3/0xc0
    common_interrupt+0x80/0xa0
    </IRQ>
    
    This inconsistency sometimes leads to checksum validation issues in the
    upper layers of the network stack.
    
    To resolve this, this patch clears the skb->ip_summed value for each
    reused skb in by napi_reuse_skb(), ensuring that the caller is responsible
    for setting the correct checksum status. This eliminates potential
    checksum validation issues caused by improper handling of
    skb->ip_summed.
    
    Fixes: 76620aafd66f ("gro: New frags interface to avoid copying shinfo")
    Signed-off-by: Mohammad Heib <mheib@redhat.com>
    Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20250225112852.2507709-1-mheib@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: enetc: correct the xdp_tx statistics [+ + +]

Author: Wei Fang <wei.fang@nxp.com>
Date:   Mon Feb 24 19:12:46 2025 +0800

    net: enetc: correct the xdp_tx statistics
    
    commit 432a2cb3ee97a7c6ea578888fe81baad035b9307 upstream.
    
    The 'xdp_tx' is used to count the number of XDP_TX frames sent, not the
    number of Tx BDs.
    
    Fixes: 7ed2bc80074e ("net: enetc: add support for XDP_TX")
    Cc: stable@vger.kernel.org
    Signed-off-by: Wei Fang <wei.fang@nxp.com>
    Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
    Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://patch.msgid.link/20250224111251.1061098-4-wei.fang@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: enetc: fix the off-by-one issue in enetc_map_tx_buffs() [+ + +]

Author: Wei Fang <wei.fang@nxp.com>
Date:   Mon Feb 24 19:12:44 2025 +0800

    net: enetc: fix the off-by-one issue in enetc_map_tx_buffs()
    
    commit 39ab773e4c120f7f98d759415ccc2aca706bbc10 upstream.
    
    When a DMA mapping error occurs while processing skb frags, it will free
    one more tx_swbd than expected, so fix this off-by-one issue.
    
    Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
    Cc: stable@vger.kernel.org
    Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Suggested-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
    Signed-off-by: Wei Fang <wei.fang@nxp.com>
    Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
    Link: https://patch.msgid.link/20250224111251.1061098-2-wei.fang@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs() [+ + +]

Author: Wei Fang <wei.fang@nxp.com>
Date:   Mon Feb 24 19:12:51 2025 +0800

    net: enetc: fix the off-by-one issue in enetc_map_tx_tso_buffs()
    
    commit 249df695c3ffe8c8d36d46c2580ce72410976f96 upstream.
    
    There is an off-by-one issue for the err_chained_bd path, it will free
    one more tx_swbd than expected. But there is no such issue for the
    err_map_data path. To fix this off-by-one issue and make the two error
    handling consistent, the increment of 'i' and 'count' remain in sync
    and enetc_unwind_tx_frame() is called for error handling.
    
    Fixes: fb8629e2cbfc ("net: enetc: add support for software TSO")
    Cc: stable@vger.kernel.org
    Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: Wei Fang <wei.fang@nxp.com>
    Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
    Link: https://patch.msgid.link/20250224111251.1061098-9-wei.fang@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: enetc: keep track of correct Tx BD count in enetc_map_tx_tso_buffs() [+ + +]

Author: Wei Fang <wei.fang@nxp.com>
Date:   Mon Feb 24 19:12:45 2025 +0800

    net: enetc: keep track of correct Tx BD count in enetc_map_tx_tso_buffs()
    
    commit da291996b16ebd10626d4b20288327b743aff110 upstream.
    
    When creating a TSO header, if the skb is VLAN tagged, the extended BD
    will be used and the 'count' should be increased by 2 instead of 1.
    Otherwise, when an error occurs, less tx_swbd will be freed than the
    actual number.
    
    Fixes: fb8629e2cbfc ("net: enetc: add support for software TSO")
    Cc: stable@vger.kernel.org
    Suggested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Signed-off-by: Wei Fang <wei.fang@nxp.com>
    Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Reviewed-by: Claudiu Manoil <claudiu.manoil@nxp.com>
    Link: https://patch.msgid.link/20250224111251.1061098-3-wei.fang@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: enetc: update UDP checksum when updating originTimestamp field [+ + +]

Author: Wei Fang <wei.fang@nxp.com>
Date:   Mon Feb 24 19:12:48 2025 +0800

    net: enetc: update UDP checksum when updating originTimestamp field
    
    commit bbcbc906ab7b5834c1219cd17a38d78dba904aa0 upstream.
    
    There is an issue with one-step timestamp based on UDP/IP. The peer will
    discard the sync packet because of the wrong UDP checksum. For ENETC v1,
    the software needs to update the UDP checksum when updating the
    originTimestamp field, so that the hardware can correctly update the UDP
    checksum when updating the correction field. Otherwise, the UDP checksum
    in the sync packet will be wrong.
    
    Fixes: 7294380c5211 ("enetc: support PTP Sync packet one-step timestamping")
    Cc: stable@vger.kernel.org
    Signed-off-by: Wei Fang <wei.fang@nxp.com>
    Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
    Link: https://patch.msgid.link/20250224111251.1061098-6-wei.fang@nxp.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

net: ipv6: fix dst ref loop on input in rpl lwt [+ + +]

Author: Justin Iurman <justin.iurman@uliege.be>
Date:   Tue Feb 25 18:51:39 2025 +0100

    net: ipv6: fix dst ref loop on input in rpl lwt
    
    [ Upstream commit 13e55fbaec176119cff68a7e1693b251c8883c5f ]
    
    Prevent a dst ref loop on input in rpl_iptunnel.
    
    Fixes: a7a29f9c361f ("net: ipv6: add rpl sr tunnel")
    Cc: Alexander Aring <alex.aring@gmail.com>
    Cc: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ipv6: fix dst ref loop on input in seg6 lwt [+ + +]

Author: Justin Iurman <justin.iurman@uliege.be>
Date:   Tue Feb 25 18:51:38 2025 +0100

    net: ipv6: fix dst ref loop on input in seg6 lwt
    
    [ Upstream commit c64a0727f9b1cbc63a5538c8c0014e9a175ad864 ]
    
    Prevent a dst ref loop on input in seg6_iptunnel.
    
    Fixes: af4a2209b134 ("ipv6: sr: use dst_cache in seg6_input")
    Cc: David Lebrun <dlebrun@google.com>
    Cc: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Ido Schimmel <idosch@nvidia.com>
    Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ipv6: rpl_iptunnel: mitigate 2-realloc issue [+ + +]

Author: Justin Iurman <justin.iurman@uliege.be>
Date:   Tue Dec 3 13:49:45 2024 +0100

    net: ipv6: rpl_iptunnel: mitigate 2-realloc issue
    
    [ Upstream commit 985ec6f5e6235242191370628acb73d7a9f0c0ea ]
    
    This patch mitigates the two-reallocations issue with rpl_iptunnel by
    providing the dst_entry (in the cache) to the first call to
    skb_cow_head(). As a result, the very first iteration would still
    trigger two reallocations (i.e., empty cache), while next iterations
    would only trigger a single reallocation.
    
    Performance tests before/after applying this patch, which clearly shows
    there is no impact (it even shows improvement):
    - before: https://ibb.co/nQJhqwc
    - after: https://ibb.co/4ZvW6wV
    
    Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
    Cc: Alexander Aring <aahringo@redhat.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Stable-dep-of: 13e55fbaec17 ("net: ipv6: fix dst ref loop on input in rpl lwt")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ipv6: seg6_iptunnel: mitigate 2-realloc issue [+ + +]

Author: Justin Iurman <justin.iurman@uliege.be>
Date:   Tue Dec 3 13:49:44 2024 +0100

    net: ipv6: seg6_iptunnel: mitigate 2-realloc issue
    
    [ Upstream commit 40475b63761abb6f8fdef960d03228a08662c9c4 ]
    
    This patch mitigates the two-reallocations issue with seg6_iptunnel by
    providing the dst_entry (in the cache) to the first call to
    skb_cow_head(). As a result, the very first iteration would still
    trigger two reallocations (i.e., empty cache), while next iterations
    would only trigger a single reallocation.
    
    Performance tests before/after applying this patch, which clearly shows
    the improvement:
    - before: https://ibb.co/3Cg4sNH
    - after: https://ibb.co/8rQ350r
    
    Signed-off-by: Justin Iurman <justin.iurman@uliege.be>
    Cc: David Lebrun <dlebrun@google.com>
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Stable-dep-of: c64a0727f9b1 ("net: ipv6: fix dst ref loop on input in seg6 lwt")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: loopback: Avoid sending IP packets without an Ethernet header [+ + +]

Author: Ido Schimmel <idosch@nvidia.com>
Date:   Thu Feb 20 09:25:59 2025 +0200

    net: loopback: Avoid sending IP packets without an Ethernet header
    
    [ Upstream commit 0e4427f8f587c4b603475468bb3aee9418574893 ]
    
    After commit 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev")
    IPv4 neighbors can be constructed on the blackhole net device, but they
    are constructed with an output function (neigh_direct_output()) that
    simply calls dev_queue_xmit(). The latter will transmit packets via
    'skb->dev' which might not be the blackhole net device if dst_dev_put()
    switched 'dst->dev' to the blackhole net device while another CPU was
    using the dst entry in ip_output(), but after it already initialized
    'skb->dev' from 'dst->dev'.
    
    Specifically, the following can happen:
    
        CPU1                                      CPU2
    
    udp_sendmsg(sk1)                          udp_sendmsg(sk2)
    udp_send_skb()                            [...]
    ip_output()
        skb->dev = skb_dst(skb)->dev
                                              dst_dev_put()
                                                  dst->dev = blackhole_netdev
    ip_finish_output2()
        resolves neigh on dst->dev
    neigh_output()
    neigh_direct_output()
    dev_queue_xmit()
    
    This will result in IPv4 packets being sent without an Ethernet header
    via a valid net device:
    
    tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
    listening on enp9s0, link-type EN10MB (Ethernet), snapshot length 262144 bytes
    22:07:02.329668 20:00:40:11:18:fb > 45:00:00:44:f4:94, ethertype Unknown
    (0x58c6), length 68:
            0x0000:  8dda 74ca f1ae ca6c ca6c 0098 969c 0400  ..t....l.l......
            0x0010:  0000 4730 3f18 6800 0000 0000 0000 9971  ..G0?.h........q
            0x0020:  c4c9 9055 a157 0a70 9ead bf83 38ca ab38  ...U.W.p....8..8
            0x0030:  8add ab96 e052                           .....R
    
    Fix by making sure that neighbors are constructed on top of the
    blackhole net device with an output function that simply consumes the
    packets, in a similar fashion to dst_discard_out() and
    blackhole_netdev_xmit().
    
    Fixes: 8d7017fd621d ("blackhole_netdev: use blackhole_netdev to invalidate dst entries")
    Fixes: 22600596b675 ("ipv4: give an IPv4 dev to blackhole_netdev")
    Reported-by: Florian Meister <fmei@sfs.com>
    Closes: https://lore.kernel.org/netdev/20250210084931.23a5c2e4@hermes.local/
    Signed-off-by: Ido Schimmel <idosch@nvidia.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Link: https://patch.msgid.link/20250220072559.782296-1-idosch@nvidia.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination. [+ + +]

Author: Harshal Chaudhari <hchaudhari@marvell.com>
Date:   Mon Feb 24 20:20:58 2025 -0800

    net: mvpp2: cls: Fixed Non IP flow, with vlan tag flow defination.
    
    [ Upstream commit 2d253726ff7106b39a44483b6864398bba8a2f74 ]
    
    Non IP flow, with vlan tag not working as expected while
    running below command for vlan-priority. fixed that.
    
    ethtool -N eth1 flow-type ether vlan 0x8000 vlan-mask 0x1fff action 0 loc 0
    
    Fixes: 1274daede3ef ("net: mvpp2: cls: Add steering based on vlan Id and priority.")
    Signed-off-by: Harshal Chaudhari <hchaudhari@marvell.com>
    Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
    Link: https://patch.msgid.link/20250225042058.2643838-1-hchaudhari@marvell.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: set the minimum for net_hotdata.netdev_budget_usecs [+ + +]

Author: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Date:   Thu Feb 20 12:07:52 2025 +0100

    net: set the minimum for net_hotdata.netdev_budget_usecs
    
    [ Upstream commit c180188ec02281126045414e90d08422a80f75b4 ]
    
    Commit 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs
    to enable softirq tuning") added a possibility to set
    net_hotdata.netdev_budget_usecs, but added no lower bound checking.
    
    Commit a4837980fd9f ("net: revert default NAPI poll timeout to 2 jiffies")
    made the *initial* value HZ-dependent, so the initial value is at least
    2 jiffies even for lower HZ values (2 ms for 1000 Hz, 8ms for 250 Hz, 20
    ms for 100 Hz).
    
    But a user still can set improper values by a sysctl. Set .extra1
    (the lower bound) for net_hotdata.netdev_budget_usecs to the same value
    as in the latter commit. That is to 2 jiffies.
    
    Fixes: a4837980fd9f ("net: revert default NAPI poll timeout to 2 jiffies")
    Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning")
    Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
    Cc: Dmitry Yakunin <zeil@yandex-team.ru>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Link: https://patch.msgid.link/20250220110752.137639-1-jirislaby@kernel.org
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ti: icss-iep: Reject perout generation request [+ + +]

Author: Meghana Malladi <m-malladi@ti.com>
Date:   Thu Feb 27 14:54:41 2025 +0530

    net: ti: icss-iep: Reject perout generation request
    
    [ Upstream commit 54e1b4becf5e220be03db4e1be773c1310e8cbbd ]
    
    IEP driver supports both perout and pps signal generation
    but perout feature is faulty with half-cooked support
    due to some missing configuration. Remove perout
    support from the driver and reject perout requests with
    "not supported" error code.
    
    Fixes: c1e0230eeaab2 ("net: ti: icss-iep: Add IEP driver")
    Signed-off-by: Meghana Malladi <m-malladi@ti.com>
    Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
    Link: https://patch.msgid.link/20250227092441.1848419-1-m-malladi@ti.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

net: ti: icss-iep: Remove spinlock-based synchronization [+ + +]

Author: Diogo Ivo <diogo.ivo@siemens.com>
Date:   Mon Jun 17 16:21:41 2024 +0100

    net: ti: icss-iep: Remove spinlock-based synchronization
    
    [ Upstream commit 5758e03cf604aa282b9afa61aec3188c4a9b3fe7 ]
    
    As all sources of concurrency in hardware register access occur in
    non-interrupt context eliminate spinlock-based synchronization and
    rely on the mutex-based synchronization that is already present.
    
    Signed-off-by: Diogo Ivo <diogo.ivo@siemens.com>
    Reviewed-by: Simon Horman <horms@kernel.org>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Stable-dep-of: 54e1b4becf5e ("net: ti: icss-iep: Reject perout generation request")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

ovl: fix UAF in ovl_dentry_update_reval by moving dput() in ovl_link_up [+ + +]

Author: Vasiliy Kovalev <kovalev@altlinux.org>
Date:   Sat Feb 15 00:51:48 2025 +0300

    ovl: fix UAF in ovl_dentry_update_reval by moving dput() in ovl_link_up
    
    [ Upstream commit c84e125fff2615b4d9c259e762596134eddd2f27 ]
    
    The issue was caused by dput(upper) being called before
    ovl_dentry_update_reval(), while upper->d_flags was still
    accessed in ovl_dentry_remote().
    
    Move dput(upper) after its last use to prevent use-after-free.
    
    BUG: KASAN: slab-use-after-free in ovl_dentry_remote fs/overlayfs/util.c:162 [inline]
    BUG: KASAN: slab-use-after-free in ovl_dentry_update_reval+0xd2/0xf0 fs/overlayfs/util.c:167
    
    Call Trace:
     <TASK>
     __dump_stack lib/dump_stack.c:88 [inline]
     dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:114
     print_address_description mm/kasan/report.c:377 [inline]
     print_report+0xc3/0x620 mm/kasan/report.c:488
     kasan_report+0xd9/0x110 mm/kasan/report.c:601
     ovl_dentry_remote fs/overlayfs/util.c:162 [inline]
     ovl_dentry_update_reval+0xd2/0xf0 fs/overlayfs/util.c:167
     ovl_link_up fs/overlayfs/copy_up.c:610 [inline]
     ovl_copy_up_one+0x2105/0x3490 fs/overlayfs/copy_up.c:1170
     ovl_copy_up_flags+0x18d/0x200 fs/overlayfs/copy_up.c:1223
     ovl_rename+0x39e/0x18c0 fs/overlayfs/dir.c:1136
     vfs_rename+0xf84/0x20a0 fs/namei.c:4893
    ...
     </TASK>
    
    Fixes: b07d5cc93e1b ("ovl: update of dentry revalidate flags after copy up")
    Reported-by: syzbot+316db8a1191938280eb6@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=316db8a1191938280eb6
    Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
    Link: https://lore.kernel.org/r/20250214215148.761147-1-kovalev@altlinux.org
    Reviewed-by: Amir Goldstein <amir73il@gmail.com>
    Signed-off-by: Christian Brauner <brauner@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

perf/core: Add RCU read lock protection to perf_iterate_ctx() [+ + +]

Author: Breno Leitao <leitao@debian.org>
Date:   Fri Jan 17 06:41:07 2025 -0800

    perf/core: Add RCU read lock protection to perf_iterate_ctx()
    
    commit 0fe8813baf4b2e865d3b2c735ce1a15b86002c74 upstream.
    
    The perf_iterate_ctx() function performs RCU list traversal but
    currently lacks RCU read lock protection. This causes lockdep warnings
    when running perf probe with unshare(1) under CONFIG_PROVE_RCU_LIST=y:
    
            WARNING: suspicious RCU usage
            kernel/events/core.c:8168 RCU-list traversed in non-reader section!!
    
             Call Trace:
              lockdep_rcu_suspicious
              ? perf_event_addr_filters_apply
              perf_iterate_ctx
              perf_event_exec
              begin_new_exec
              ? load_elf_phdrs
              load_elf_binary
              ? lock_acquire
              ? find_held_lock
              ? bprm_execve
              bprm_execve
              do_execveat_common.isra.0
              __x64_sys_execve
              do_syscall_64
              entry_SYSCALL_64_after_hwframe
    
    This protection was previously present but was removed in commit
    bd2756811766 ("perf: Rewrite core context handling"). Add back the
    necessary rcu_read_lock()/rcu_read_unlock() pair around
    perf_iterate_ctx() call in perf_event_exec().
    
    [ mingo: Use scoped_guard() as suggested by Peter ]
    
    Fixes: bd2756811766 ("perf: Rewrite core context handling")
    Signed-off-by: Breno Leitao <leitao@debian.org>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20250117-fix_perf_rcu-v1-1-13cb9210fc6a@debian.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/core: Fix low freq setting via IOC_PERIOD [+ + +]

Author: Kan Liang <kan.liang@linux.intel.com>
Date:   Fri Jan 17 07:19:12 2025 -0800

    perf/core: Fix low freq setting via IOC_PERIOD
    
    commit 0d39844150546fa1415127c5fbae26db64070dd3 upstream.
    
    A low attr::freq value cannot be set via IOC_PERIOD on some platforms.
    
    The perf_event_check_period() introduced in:
    
      81ec3f3c4c4d ("perf/x86: Add check_period PMU callback")
    
    was intended to check the period, rather than the frequency.
    A low frequency may be mistakenly rejected by limit_period().
    
    Fix it.
    
    Fixes: 81ec3f3c4c4d ("perf/x86: Add check_period PMU callback")
    Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20250117151913.3043942-2-kan.liang@linux.intel.com
    Closes: https://lore.kernel.org/lkml/20250115154949.3147-1-ravi.bangoria@amd.com/
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list [+ + +]

Author: Luo Gengkun <luogengkun@huaweicloud.com>
Date:   Wed Jan 22 07:33:56 2025 +0000

    perf/core: Order the PMU list to fix warning about unordered pmu_ctx_list
    
    [ Upstream commit 2016066c66192a99d9e0ebf433789c490a6785a2 ]
    
    Syskaller triggers a warning due to prev_epc->pmu != next_epc->pmu in
    perf_event_swap_task_ctx_data(). vmcore shows that two lists have the same
    perf_event_pmu_context, but not in the same order.
    
    The problem is that the order of pmu_ctx_list for the parent is impacted by
    the time when an event/PMU is added. While the order for a child is
    impacted by the event order in the pinned_groups and flexible_groups. So
    the order of pmu_ctx_list in the parent and child may be different.
    
    To fix this problem, insert the perf_event_pmu_context to its proper place
    after iteration of the pmu_ctx_list.
    
    The follow testcase can trigger above warning:
    
     # perf record -e cycles --call-graph lbr -- taskset -c 3 ./a.out &
     # perf stat -e cpu-clock,cs -p xxx // xxx is the pid of a.out
    
     test.c
    
     void main() {
            int count = 0;
            pid_t pid;
    
            printf("%d running\n", getpid());
            sleep(30);
            printf("running\n");
    
            pid = fork();
            if (pid == -1) {
                    printf("fork error\n");
                    return;
            }
            if (pid == 0) {
                    while (1) {
                            count++;
                    }
            } else {
                    while (1) {
                            count++;
                    }
            }
     }
    
    The testcase first opens an LBR event, so it will allocate task_ctx_data,
    and then open tracepoint and software events, so the parent context will
    have 3 different perf_event_pmu_contexts. On inheritance, child ctx will
    insert the perf_event_pmu_context in another order and the warning will
    trigger.
    
    [ mingo: Tidied up the changelog. ]
    
    Fixes: bd2756811766 ("perf: Rewrite core context handling")
    Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
    Link: https://lore.kernel.org/r/20250122073356.1824736-1-luogengkun@huaweicloud.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

perf/x86: Fix low freqency setting issue [+ + +]

Author: Kan Liang <kan.liang@linux.intel.com>
Date:   Fri Jan 17 07:19:11 2025 -0800

    perf/x86: Fix low freqency setting issue
    
    commit 88ec7eedbbd21cad38707620ad6c48a4e9a87c18 upstream.
    
    Perf doesn't work at low frequencies:
    
      $ perf record -e cpu_core/instructions/ppp -F 120
      Error:
      The sys_perf_event_open() syscall returned with 22 (Invalid argument)
      for event (cpu_core/instructions/ppp).
      "dmesg | grep -i perf" may provide additional information.
    
    The limit_period() check avoids a low sampling period on a counter. It
    doesn't intend to limit the frequency.
    
    The check in the x86_pmu_hw_config() should be limited to non-freq mode.
    The attr.sample_period and attr.sample_freq are union. The
    attr.sample_period should not be used to indicate the frequency mode.
    
    Fixes: c46e665f0377 ("perf/x86: Add INST_RETIRED.ALL workarounds")
    Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20250117151913.3043942-1-kan.liang@linux.intel.com
    Closes: https://lore.kernel.org/lkml/20250115154949.3147-1-ravi.bangoria@amd.com/
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

phy: exynos5-usbdrd: fix MPLL_MULTIPLIER and SSC_REFCLKSEL masks in refclk [+ + +]

Author: Kaustabh Chakraborty <kauschluss@disroot.org>
Date:   Sun Feb 9 00:29:30 2025 +0530

    phy: exynos5-usbdrd: fix MPLL_MULTIPLIER and SSC_REFCLKSEL masks in refclk
    
    commit e2158c953c973adb49383ddea2504faf08d375b7 upstream.
    
    In exynos5_usbdrd_{pipe3,utmi}_set_refclk(), the masks
    PHYCLKRST_MPLL_MULTIPLIER_MASK and PHYCLKRST_SSC_REFCLKSEL_MASK are not
    inverted when applied to the register values. Fix it.
    
    Cc: stable@vger.kernel.org
    Fixes: 59025887fb08 ("phy: Add new Exynos5 USB 3.0 PHY driver")
    Signed-off-by: Kaustabh Chakraborty <kauschluss@disroot.org>
    Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
    Reviewed-by: Anand Moon <linux.amoon@gmail.com>
    Link: https://lore.kernel.org/r/20250209-exynos5-usbdrd-masks-v1-1-4f7f83f323d7@disroot.org
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

phy: rockchip: naneng-combphy: compatible reset with old DT [+ + +]

Author: Chukun Pan <amadeus@jmu.edu.cn>
Date:   Mon Jan 6 18:00:01 2025 +0800

    phy: rockchip: naneng-combphy: compatible reset with old DT
    
    [ Upstream commit 3126ea9be66b53e607f87f067641ba724be24181 ]
    
    The device tree of RK3568 did not specify reset-names before.
    So add fallback to old behaviour to be compatible with old DT.
    
    Fixes: fbcbffbac994 ("phy: rockchip: naneng-combphy: fix phy reset")
    Cc: Jianfeng Liu <liujianfeng1994@gmail.com>
    Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn>
    Reviewed-by: Jonas Karlman <jonas@kwiboo.se>
    Link: https://lore.kernel.org/r/20250106100001.1344418-2-amadeus@jmu.edu.cn
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

phy: tegra: xusb: reset VBUS & ID OVERRIDE [+ + +]

Author: BH Hsieh <bhsieh@nvidia.com>
Date:   Wed Jan 22 18:59:43 2025 +0800

    phy: tegra: xusb: reset VBUS & ID OVERRIDE
    
    commit 55f1a5f7c97c3c92ba469e16991a09274410ceb7 upstream.
    
    Observed VBUS_OVERRIDE & ID_OVERRIDE might be programmed
    with unexpected value prior to XUSB PADCTL driver, this
    could also occur in virtualization scenario.
    
    For example, UEFI firmware programs ID_OVERRIDE=GROUNDED to set
    a type-c port to host mode and keeps the value to kernel.
    If the type-c port is connected a usb host, below errors can be
    observed right after usb host mode driver gets probed. The errors
    would keep until usb role class driver detects the type-c port
    as device mode and notifies usb device mode driver to set both
    ID_OVERRIDE and VBUS_OVERRIDE to correct value by XUSB PADCTL
    driver.
    
    [  173.765814] usb usb3-port2: Cannot enable. Maybe the USB cable is bad?
    [  173.765837] usb usb3-port2: config error
    
    Taking virtualization into account, asserting XUSB PADCTL
    reset would break XUSB functions used by other guest OS,
    hence only reset VBUS & ID OVERRIDE of the port in
    utmi_phy_init.
    
    Fixes: bbf711682cd5 ("phy: tegra: xusb: Add Tegra186 support")
    Cc: stable@vger.kernel.org
    Change-Id: Ic63058d4d49b4a1f8f9ab313196e20ad131cc591
    Signed-off-by: BH Hsieh <bhsieh@nvidia.com>
    Signed-off-by: Henry Lin <henryl@nvidia.com>
    Link: https://lore.kernel.org/r/20250122105943.8057-1-henryl@nvidia.com
    Signed-off-by: Vinod Koul <vkoul@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rcuref: Plug slowpath race in rcuref_put() [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Sun Jan 19 00:55:32 2025 +0100

    rcuref: Plug slowpath race in rcuref_put()
    
    commit b9a49520679e98700d3d89689cc91c08a1c88c1d upstream.
    
    Kernel test robot reported an "imbalanced put" in the rcuref_put() slow
    path, which turned out to be a false positive. Consider the following race:
    
                ref  = 0 (via rcuref_init(ref, 1))
     T1                                      T2
     rcuref_put(ref)
     -> atomic_add_negative_release(-1, ref)                                         # ref -> 0xffffffff
     -> rcuref_put_slowpath(ref)
                                             rcuref_get(ref)
                                             -> atomic_add_negative_relaxed(1, &ref->refcnt)
                                               -> return true;                       # ref -> 0
    
                                             rcuref_put(ref)
                                             -> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff
                                             -> rcuref_put_slowpath()
    
        -> cnt = atomic_read(&ref->refcnt);                                          # cnt -> 0xffffffff / RCUREF_NOREF
        -> atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD))              # ref -> 0xe0000000 / RCUREF_DEAD
           -> return true
                                               -> cnt = atomic_read(&ref->refcnt);   # cnt -> 0xe0000000 / RCUREF_DEAD
                                               -> if (cnt > RCUREF_RELEASED)         # 0xe0000000 > 0xc0000000
                                                 -> WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")
    
    The problem is the additional read in the slow path (after it
    decremented to RCUREF_NOREF) which can happen after the counter has been
    marked RCUREF_DEAD.
    
    Prevent this by reusing the return value of the decrement. Now every "final"
    put uses RCUREF_NOREF in the slow path and attempts the final cmpxchg() to
    RCUREF_DEAD.
    
    [ bigeasy: Add changelog ]
    
    Fixes: ee1ee6db07795 ("atomics: Provide rcuref - scalable reference counting")
    Reported-by: kernel test robot <oliver.sang@intel.com>
    Debugged-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
    Cc: stable@vger.kernel.org
    Closes: https://lore.kernel.org/oe-lkp/202412311453.9d7636a2-lkp@intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

RDMA/mana_ib: Allocate PAGE aligned doorbell index [+ + +]

Author: Konstantin Taranov <kotaranov@microsoft.com>
Date:   Wed Feb 5 02:30:05 2025 -0800

    RDMA/mana_ib: Allocate PAGE aligned doorbell index
    
    [ Upstream commit 29b7bb98234cc287cebef9bccf638c2e3f39be71 ]
    
    Allocate a PAGE aligned doorbell index to ensure each process gets a
    separate PAGE sized doorbell area space remapped to it in mana_ib_mmap
    
    Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter")
    Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com>
    Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
    Link: https://patch.msgid.link/1738751405-15041-1-git-send-email-kotaranov@linux.microsoft.com
    Reviewed-by: Long Li <longli@microsoft.com>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

RDMA/mlx5: Fix AH static rate parsing [+ + +]

Author: Patrisious Haddad <phaddad@nvidia.com>
Date:   Mon Feb 10 13:32:39 2025 +0200

    RDMA/mlx5: Fix AH static rate parsing
    
    [ Upstream commit c534ffda781f44a1c6ac25ef6e0e444da38ca8af ]
    
    Previously static rate wasn't translated according to our PRM but simply
    used the 4 lower bytes.
    
    Correctly translate static rate value passed in AH creation attribute
    according to our PRM expected values.
    
    In addition change 800GB mapping to zero, which is the PRM
    specified value.
    
    Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters")
    Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
    Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
    Link: https://patch.msgid.link/18ef4cc5396caf80728341eb74738cd777596f60.1739187089.git.leon@kernel.org
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

RDMA/mlx5: Fix bind QP error cleanup flow [+ + +]

Author: Patrisious Haddad <phaddad@nvidia.com>
Date:   Thu Feb 20 08:47:10 2025 +0200

    RDMA/mlx5: Fix bind QP error cleanup flow
    
    [ Upstream commit e1a0bdbdfdf08428f0ede5ae49c7f4139ac73ef5 ]
    
    When there is a failure during bind QP, the cleanup flow destroys the
    counter regardless if it is the one that created it or not, which is
    problematic since if it isn't the one that created it, that counter could
    still be in use.
    
    Fix that by destroying the counter only if it was created during this call.
    
    Fixes: 45842fc627c7 ("IB/mlx5: Support statistic q counter configuration")
    Signed-off-by: Patrisious Haddad <phaddad@nvidia.com>
    Reviewed-by: Mark Zhang <markzhang@nvidia.com>
    Link: https://patch.msgid.link/25dfefddb0ebefa668c32e06a94d84e3216257cf.1740033937.git.leon@kernel.org
    Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
    Signed-off-by: Leon Romanovsky <leon@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

Revert "rtla/timerlat_hist: Set OSNOISE_WORKLOAD for kernel threads" [+ + +]

Author: Tomas Glozar <tglozar@redhat.com>
Date:   Fri Feb 28 14:57:06 2025 +0100

    Revert "rtla/timerlat_hist: Set OSNOISE_WORKLOAD for kernel threads"
    
    This reverts commit 83b74901bdc9b58739193b8ee6989254391b6ba7.
    
    The commit breaks rtla build, since params->kernel_workload is not
    present on 6.6-stable.
    
    Signed-off-by: Tomas Glozar <tglozar@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Revert "rtla/timerlat_top: Set OSNOISE_WORKLOAD for kernel threads" [+ + +]

Author: Tomas Glozar <tglozar@redhat.com>
Date:   Fri Feb 28 14:57:05 2025 +0100

    Revert "rtla/timerlat_top: Set OSNOISE_WORKLOAD for kernel threads"
    
    This reverts commit 41955b6c268154f81e34f9b61cf8156eec0730c0.
    
    The commit breaks rtla build, since params->kernel_workload is not
    present on 6.6-stable.
    
    Signed-off-by: Tomas Glozar <tglozar@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

riscv/futex: sign extend compare value in atomic cmpxchg [+ + +]

Author: Andreas Schwab <schwab@suse.de>
Date:   Mon Feb 3 11:06:00 2025 +0100

    riscv/futex: sign extend compare value in atomic cmpxchg
    
    commit 599c44cd21f4967774e0acf58f734009be4aea9a upstream.
    
    Make sure the compare value in the lr/sc loop is sign extended to match
    what lr.w does.  Fortunately, due to the compiler keeping the register
    contents sign extended anyway the lack of the explicit extension didn't
    result in wrong code so far, but this cannot be relied upon.
    
    Fixes: b90edb33010b ("RISC-V: Add futex support.")
    Signed-off-by: Andreas Schwab <schwab@suse.de>
    Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
    Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/mvmfrkv2vhz.fsf@suse.de
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

riscv: KVM: Fix hart suspend status check [+ + +]

Author: Andrew Jones <ajones@ventanamicro.com>
Date:   Mon Feb 17 09:45:08 2025 +0100

    riscv: KVM: Fix hart suspend status check
    
    [ Upstream commit c7db342e3b4744688be1e27e31254c1d31a35274 ]
    
    "Not stopped" means started or suspended so we need to check for
    a single state in order to have a chance to check for each state.
    Also, we need to use target_vcpu when checking for the suspend
    state.
    
    Fixes: 763c8bed8c05 ("RISC-V: KVM: Implement SBI HSM suspend call")
    Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
    Reviewed-by: Anup Patel <anup@brainfault.org>
    Link: https://lore.kernel.org/r/20250217084506.18763-8-ajones@ventanamicro.com
    Signed-off-by: Anup Patel <anup@brainfault.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

riscv: KVM: Fix SBI IPI error generation [+ + +]

Author: Andrew Jones <ajones@ventanamicro.com>
Date:   Mon Feb 17 09:45:10 2025 +0100

    riscv: KVM: Fix SBI IPI error generation
    
    [ Upstream commit 0611f78f83c93c000029ab01daa28166d03590ed ]
    
    When an invalid function ID of an SBI extension is used we should
    return not-supported, not invalid-param. Also, when we see that at
    least one hartid constructed from the base and mask parameters is
    invalid, then we should return invalid-param. Finally, rather than
    relying on overflowing a left shift to result in zero and then using
    that zero in a condition which [correctly] skips sending an IPI (but
    loops unnecessarily), explicitly check for overflow and exit the loop
    immediately.
    
    Fixes: 5f862df5585c ("RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2")
    Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
    Reviewed-by: Anup Patel <anup@brainfault.org>
    Link: https://lore.kernel.org/r/20250217084506.18763-10-ajones@ventanamicro.com
    Signed-off-by: Anup Patel <anup@brainfault.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

riscv: KVM: Fix SBI TIME error generation [+ + +]

Author: Andrew Jones <ajones@ventanamicro.com>
Date:   Mon Feb 17 09:45:11 2025 +0100

    riscv: KVM: Fix SBI TIME error generation
    
    [ Upstream commit b901484852992cf3d162a5eab72251cc813ca624 ]
    
    When an invalid function ID of an SBI extension is used we should
    return not-supported, not invalid-param.
    
    Fixes: 5f862df5585c ("RISC-V: KVM: Add v0.1 replacement SBI extensions defined in v0.2")
    Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
    Reviewed-by: Anup Patel <anup@brainfault.org>
    Link: https://lore.kernel.org/r/20250217084506.18763-11-ajones@ventanamicro.com
    Signed-off-by: Anup Patel <anup@brainfault.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

RISCV: KVM: Introduce mp_state_lock to avoid lock inversion [+ + +]

Author: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Date:   Wed Apr 17 15:45:25 2024 +0800

    RISCV: KVM: Introduce mp_state_lock to avoid lock inversion
    
    [ Upstream commit 2121cadec45aaf61fa45b3aa3d99723ed4e6683a ]
    
    Documentation/virt/kvm/locking.rst advises that kvm->lock should be
    acquired outside vcpu->mutex and kvm->srcu. However, when KVM/RISC-V
    handling SBI_EXT_HSM_HART_START, the lock ordering is vcpu->mutex,
    kvm->srcu then kvm->lock.
    
    Although the lockdep checking no longer complains about this after commit
    f0f44752f5f6 ("rcu: Annotate SRCU's update-side lockdep dependencies"),
    it's necessary to replace kvm->lock with a new dedicated lock to ensure
    only one hart can execute the SBI_EXT_HSM_HART_START call for the target
    hart simultaneously.
    
    Additionally, this patch also rename "power_off" to "mp_state" with two
    possible values. The vcpu->mp_state_lock also protects the access of
    vcpu->mp_state.
    
    Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
    Reviewed-by: Anup Patel <anup@brainfault.org>
    Link: https://lore.kernel.org/r/20240417074528.16506-2-yongxuan.wang@sifive.com
    Signed-off-by: Anup Patel <anup@brainfault.org>
    Stable-dep-of: c7db342e3b47 ("riscv: KVM: Fix hart suspend status check")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

riscv: signal: fix signal frame size [+ + +]

Author: Yong-Xuan Wang <yongxuan.wang@sifive.com>
Date:   Fri Dec 20 16:39:23 2024 +0800

    riscv: signal: fix signal frame size
    
    commit aa49bc2ca8524186ceb0811c23a7f00c3dea6987 upstream.
    
    The signal context of certain RISC-V extensions will be appended after
    struct __riscv_extra_ext_header, which already includes an empty context
    header. Therefore, there is no need to preserve a separate hdr for the
    END of signal context.
    
    Fixes: 8ee0b41898fa ("riscv: signal: Add sigcontext save/restore for vector")
    Signed-off-by: Yong-Xuan Wang <yongxuan.wang@sifive.com>
    Reviewed-by: Zong Li <zong.li@sifive.com>
    Reviewed-by: Andy Chiu <AndybnAC@gmail.com>
    Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20241220083926.19453-2-yongxuan.wang@sifive.com
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rseq/selftests: Fix riscv rseq_offset_deref_addv inline asm [+ + +]

Author: Stafford Horne <shorne@gmail.com>
Date:   Tue Jan 14 17:07:21 2025 +0000

    rseq/selftests: Fix riscv rseq_offset_deref_addv inline asm
    
    commit 713e788c0e07e185fd44dd581f74855ef149722f upstream.
    
    When working on OpenRISC support for restartable sequences I noticed
    and fixed these two issues with the riscv support bits.
    
     1 The 'inc' argument to RSEQ_ASM_OP_R_DEREF_ADDV was being implicitly
       passed to the macro.  Fix this by adding 'inc' to the list of macro
       arguments.
     2 The inline asm input constraints for 'inc' and 'off' use "er",  The
       riscv gcc port does not have an "e" constraint, this looks to be
       copied from the x86 port.  Fix this by just using an "r" constraint.
    
    I have compile tested this only for riscv.  However, the same fixes I
    use in the OpenRISC rseq selftests and everything passes with no issues.
    
    Fixes: 171586a6ab66 ("selftests/rseq: riscv: Template memory ordering and percpu access mode")
    Signed-off-by: Stafford Horne <shorne@gmail.com>
    Tested-by: Charlie Jenkins <charlie@rivosinc.com>
    Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
    Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Acked-by: Shuah Khan <skhan@linuxfoundation.org>
    Cc: stable@vger.kernel.org
    Link: https://lore.kernel.org/r/20250114170721.3613280-1-shorne@gmail.com
    Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtla/timerlat_hist: Set OSNOISE_WORKLOAD for kernel threads [+ + +]

Author: Tomas Glozar <tglozar@redhat.com>
Date:   Fri Feb 28 14:57:07 2025 +0100

    rtla/timerlat_hist: Set OSNOISE_WORKLOAD for kernel threads
    
    commit d8d866171a414ed88bd0d720864095fd75461134 upstream.
    
    When using rtla timerlat with userspace threads (-u or -U), rtla
    disables the OSNOISE_WORKLOAD option in
    /sys/kernel/tracing/osnoise/options. This option is not re-enabled in a
    subsequent run with kernel-space threads, leading to rtla collecting no
    results if the previous run exited abnormally:
    
    $ rtla timerlat hist -u
    ^\Quit (core dumped)
    $ rtla timerlat hist -k -d 1s
    Index
    over:
    count:
    min:
    avg:
    max:
    ALL:        IRQ       Thr       Usr
    count:        0         0         0
    min:          -         -         -
    avg:          -         -         -
    max:          -         -         -
    
    The issue persists until OSNOISE_WORKLOAD is set manually by running:
    $ echo OSNOISE_WORKLOAD > /sys/kernel/tracing/osnoise/options
    
    Set OSNOISE_WORKLOAD when running rtla with kernel-space threads if
    available to fix the issue.
    
    Cc: stable@vger.kernel.org
    Cc: John Kacur <jkacur@redhat.com>
    Cc: Luis Goncalves <lgoncalv@redhat.com>
    Link: https://lore.kernel.org/20250107144823.239782-3-tglozar@redhat.com
    Fixes: ed774f7481fa ("rtla/timerlat_hist: Add timerlat user-space support")
    Signed-off-by: Tomas Glozar <tglozar@redhat.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    [ params->kernel_workload does not exist in 6.6, use
    !params->user_hist ]
    Signed-off-by: Tomas Glozar <tglozar@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rtla/timerlat_top: Set OSNOISE_WORKLOAD for kernel threads [+ + +]

Author: Tomas Glozar <tglozar@redhat.com>
Date:   Fri Feb 28 14:57:08 2025 +0100

    rtla/timerlat_top: Set OSNOISE_WORKLOAD for kernel threads
    
    commit 217f0b1e990e30a1f06f6d531fdb4530f4788d48 upstream.
    
    When using rtla timerlat with userspace threads (-u or -U), rtla
    disables the OSNOISE_WORKLOAD option in
    /sys/kernel/tracing/osnoise/options. This option is not re-enabled in a
    subsequent run with kernel-space threads, leading to rtla collecting no
    results if the previous run exited abnormally:
    
    $ rtla timerlat top -u
    ^\Quit (core dumped)
    $ rtla timerlat top -k -d 1s
                                         Timer Latency
      0 00:00:01   |          IRQ Timer Latency (us)        |         Thread Timer Latency (us)
    CPU COUNT      |      cur       min       avg       max |      cur       min       avg       max
    
    The issue persists until OSNOISE_WORKLOAD is set manually by running:
    $ echo OSNOISE_WORKLOAD > /sys/kernel/tracing/osnoise/options
    
    Set OSNOISE_WORKLOAD when running rtla with kernel-space threads if
    available to fix the issue.
    
    Cc: stable@vger.kernel.org
    Cc: John Kacur <jkacur@redhat.com>
    Cc: Luis Goncalves <lgoncalv@redhat.com>
    Link: https://lore.kernel.org/20250107144823.239782-4-tglozar@redhat.com
    Fixes: cdca4f4e5e8e ("rtla/timerlat_top: Add timerlat user-space support")
    Signed-off-by: Tomas Glozar <tglozar@redhat.com>
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    [ params->kernel_workload does not exist in 6.6, use
    !params->user_top ]
    Signed-off-by: Tomas Glozar <tglozar@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

rxrpc: rxperf: Fix missing decoding of terminal magic cookie [+ + +]

Author: David Howells <dhowells@redhat.com>
Date:   Tue Feb 18 19:22:44 2025 +0000

    rxrpc: rxperf: Fix missing decoding of terminal magic cookie
    
    [ Upstream commit c34d999ca3145d9fe858258cc3342ec493f47d2e ]
    
    The rxperf RPCs seem to have a magic cookie at the end of the request that
    was failing to be taken account of by the unmarshalling of the request.
    Fix the rxperf code to expect this.
    
    Fixes: 75bfdbf2fca3 ("rxrpc: Implement an in-kernel rxperf server for testing purposes")
    Signed-off-by: David Howells <dhowells@redhat.com>
    cc: Marc Dionne <marc.dionne@auristor.com>
    cc: Simon Horman <horms@kernel.org>
    cc: linux-afs@lists.infradead.org
    Link: https://patch.msgid.link/20250218192250.296870-2-dhowells@redhat.com
    Signed-off-by: Jakub Kicinski <kuba@kernel.org>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

sched/core: Prevent rescheduling when interrupts are disabled [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Dec 16 14:20:56 2024 +0100

    sched/core: Prevent rescheduling when interrupts are disabled
    
    commit 82c387ef7568c0d96a918a5a78d9cad6256cfa15 upstream.
    
    David reported a warning observed while loop testing kexec jump:
    
      Interrupts enabled after irqrouter_resume+0x0/0x50
      WARNING: CPU: 0 PID: 560 at drivers/base/syscore.c:103 syscore_resume+0x18a/0x220
       kernel_kexec+0xf6/0x180
       __do_sys_reboot+0x206/0x250
       do_syscall_64+0x95/0x180
    
    The corresponding interrupt flag trace:
    
      hardirqs last  enabled at (15573): [<ffffffffa8281b8e>] __up_console_sem+0x7e/0x90
      hardirqs last disabled at (15580): [<ffffffffa8281b73>] __up_console_sem+0x63/0x90
    
    That means __up_console_sem() was invoked with interrupts enabled. Further
    instrumentation revealed that in the interrupt disabled section of kexec
    jump one of the syscore_suspend() callbacks woke up a task, which set the
    NEED_RESCHED flag. A later callback in the resume path invoked
    cond_resched() which in turn led to the invocation of the scheduler:
    
      __cond_resched+0x21/0x60
      down_timeout+0x18/0x60
      acpi_os_wait_semaphore+0x4c/0x80
      acpi_ut_acquire_mutex+0x3d/0x100
      acpi_ns_get_node+0x27/0x60
      acpi_ns_evaluate+0x1cb/0x2d0
      acpi_rs_set_srs_method_data+0x156/0x190
      acpi_pci_link_set+0x11c/0x290
      irqrouter_resume+0x54/0x60
      syscore_resume+0x6a/0x200
      kernel_kexec+0x145/0x1c0
      __do_sys_reboot+0xeb/0x240
      do_syscall_64+0x95/0x180
    
    This is a long standing problem, which probably got more visible with
    the recent printk changes. Something does a task wakeup and the
    scheduler sets the NEED_RESCHED flag. cond_resched() sees it set and
    invokes schedule() from a completely bogus context. The scheduler
    enables interrupts after context switching, which causes the above
    warning at the end.
    
    Quite some of the code paths in syscore_suspend()/resume() can result in
    triggering a wakeup with the exactly same consequences. They might not
    have done so yet, but as they share a lot of code with normal operations
    it's just a question of time.
    
    The problem only affects the PREEMPT_NONE and PREEMPT_VOLUNTARY scheduling
    models. Full preemption is not affected as cond_resched() is disabled and
    the preemption check preemptible() takes the interrupt disabled flag into
    account.
    
    Cure the problem by adding a corresponding check into cond_resched().
    
    Reported-by: David Woodhouse <dwmw@amazon.co.uk>
    Suggested-by: Peter Zijlstra <peterz@infradead.org>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Tested-by: David Woodhouse <dwmw@amazon.co.uk>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: stable@vger.kernel.org
    Closes: https://lore.kernel.org/all/7717fe2ac0ce5f0a2c43fdab8b11f4483d54a2a4.camel@infradead.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: core: Clear driver private data when retrying request [+ + +]

Author: Ye Bin <yebin10@huawei.com>
Date:   Mon Feb 17 10:16:28 2025 +0800

    scsi: core: Clear driver private data when retrying request
    
    [ Upstream commit dce5c4afd035e8090a26e5d776b1682c0e649683 ]
    
    After commit 1bad6c4a57ef ("scsi: zero per-cmd private driver data for each
    MQ I/O"), the xen-scsifront/virtio_scsi/snic drivers all removed code that
    explicitly zeroed driver-private command data.
    
    In combination with commit 464a00c9e0ad ("scsi: core: Kill DRIVER_SENSE"),
    after virtio_scsi performs a capacity expansion, the first request will
    return a unit attention to indicate that the capacity has changed. And then
    the original command is retried. As driver-private command data was not
    cleared, the request would return UA again and eventually time out and fail.
    
    Zero driver-private command data when a request is retried.
    
    Fixes: f7de50da1479 ("scsi: xen-scsifront: Remove code that zeroes driver-private command data")
    Fixes: c2bb87318baa ("scsi: virtio_scsi: Remove code that zeroes driver-private command data")
    Fixes: c3006a926468 ("scsi: snic: Remove code that zeroes driver-private command data")
    Signed-off-by: Ye Bin <yebin10@huawei.com>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20250217021628.2929248-1-yebin@huaweicloud.com
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: ufs: core: Add UFS RTC support [+ + +]

Author: Bean Huo <beanhuo@micron.com>
Date:   Tue Dec 12 23:08:24 2023 +0100

    scsi: ufs: core: Add UFS RTC support
    
    [ Upstream commit 6bf999e0eb41850d5c857102535d5c53b2ede224 ]
    
    Add Real Time Clock (RTC) support for UFS device. This enhancement is
    crucial for the internal maintenance operations of the UFS device. The
    patch enables the device to handle both absolute and relative time
    information. Furthermore, it includes periodic task to update the RTC in
    accordance with the UFS Spec, ensuring the accuracy of RTC information for
    the device's internal processes.
    
    RTC and qTimestamp serve distinct purposes. The RTC provides a coarse level
    of granularity with, at best, approximate single-second resolution. This
    makes the RTC well-suited for the device to determine the approximate age
    of programmed blocks after being updated by the host. On the other hand,
    qTimestamp offers nanosecond granularity and is specifically designed for
    synchronizing Device Error Log entries with corresponding host-side logs.
    
    Given that the RTC has been a standard feature since UFS Spec 2.0, and
    qTimestamp was introduced in UFS Spec 4.0, the majority of UFS devices
    currently on the market rely on RTC. Therefore, it is advisable to continue
    supporting RTC in the Linux kernel. This ensures compatibility with the
    prevailing UFS device implementations and facilitates seamless integration
    with existing hardware.  By maintaining support for RTC, we ensure broad
    compatibility and avoid potential issues arising from deviations in device
    specifications across different UFS versions.
    
    Signed-off-by: Bean Huo <beanhuo@micron.com>
    Signed-off-by: Mike Bi <mikebi@micron.com>
    Signed-off-by: Luca Porzio <lporzio@micron.com>
    Link: https://lore.kernel.org/r/20231212220825.85255-3-beanhuo@iokpp.de
    Acked-by: Avri Altman <avri.altman@wdc.com>
    Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Stable-dep-of: 4fa382be4304 ("scsi: ufs: core: Fix ufshcd_is_ufs_dev_busy() and ufshcd_eh_timed_out()")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: ufs: core: Add ufshcd_is_ufs_dev_busy() [+ + +]

Author: Bean Huo <beanhuo@micron.com>
Date:   Tue Dec 12 23:08:23 2023 +0100

    scsi: ufs: core: Add ufshcd_is_ufs_dev_busy()
    
    [ Upstream commit 9fa268875ca4ff5cad0c1b957388a0aef39920c3 ]
    
    Add helper inline for retrieving whether UFS device is busy or not.
    
    Signed-off-by: Bean Huo <beanhuo@micron.com>
    Link: https://lore.kernel.org/r/20231212220825.85255-2-beanhuo@iokpp.de
    Reviewed-by: Avri Altman <avri.altman@wdc.com>
    Reviewed-by: Thomas Weißschuh <linux@weissschuh.net>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Stable-dep-of: 4fa382be4304 ("scsi: ufs: core: Fix ufshcd_is_ufs_dev_busy() and ufshcd_eh_timed_out()")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: ufs: core: bsg: Fix crash when arpmb command fails [+ + +]

Author: Arthur Simchaev <arthur.simchaev@sandisk.com>
Date:   Thu Feb 20 16:20:39 2025 +0200

    scsi: ufs: core: bsg: Fix crash when arpmb command fails
    
    commit f27a95845b01e86d67c8b014b4f41bd3327daa63 upstream.
    
    If the device doesn't support arpmb we'll crash due to copying user data in
    bsg_transport_sg_io_fn().
    
    In the case where ufs_bsg_exec_advanced_rpmb_req() returns an error, do not
    set the job's reply_len.
    
    Memory crash backtrace:
    3,1290,531166405,-;ufshcd 0000:00:12.5: ARPMB OP failed: error code -22
    
    4,1308,531166555,-;Call Trace:
    
    4,1309,531166559,-; <TASK>
    
    4,1310,531166565,-; ? show_regs+0x6d/0x80
    
    4,1311,531166575,-; ? die+0x37/0xa0
    
    4,1312,531166583,-; ? do_trap+0xd4/0xf0
    
    4,1313,531166593,-; ? do_error_trap+0x71/0xb0
    
    4,1314,531166601,-; ? usercopy_abort+0x6c/0x80
    
    4,1315,531166610,-; ? exc_invalid_op+0x52/0x80
    
    4,1316,531166622,-; ? usercopy_abort+0x6c/0x80
    
    4,1317,531166630,-; ? asm_exc_invalid_op+0x1b/0x20
    
    4,1318,531166643,-; ? usercopy_abort+0x6c/0x80
    
    4,1319,531166652,-; __check_heap_object+0xe3/0x120
    
    4,1320,531166661,-; check_heap_object+0x185/0x1d0
    
    4,1321,531166670,-; __check_object_size.part.0+0x72/0x150
    
    4,1322,531166679,-; __check_object_size+0x23/0x30
    
    4,1323,531166688,-; bsg_transport_sg_io_fn+0x314/0x3b0
    
    Fixes: 6ff265fc5ef6 ("scsi: ufs: core: bsg: Add advanced RPMB support in ufs_bsg")
    Cc: stable@vger.kernel.org
    Reviewed-by: Bean Huo <beanhuo@micron.com>
    Signed-off-by: Arthur Simchaev <arthur.simchaev@sandisk.com>
    Link: https://lore.kernel.org/r/20250220142039.250992-1-arthur.simchaev@sandisk.com
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: ufs: core: Cancel RTC work during ufshcd_remove() [+ + +]

Author: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Date:   Mon Nov 11 23:18:30 2024 +0530

    scsi: ufs: core: Cancel RTC work during ufshcd_remove()
    
    commit 1695c4361d35b7bdadd7b34f99c9c07741e181e5 upstream.
    
    Currently, RTC work is only cancelled during __ufshcd_wl_suspend(). When
    ufshcd is removed in ufshcd_remove(), RTC work is not cancelled. Due to
    this, any further trigger of the RTC work after ufshcd_remove() would
    result in a NULL pointer dereference as below:
    
    Unable to handle kernel NULL pointer dereference at virtual address 00000000000002a4
    Workqueue: events ufshcd_rtc_work
    Call trace:
     _raw_spin_lock_irqsave+0x34/0x8c
     pm_runtime_get_if_active+0x24/0xb4
     ufshcd_rtc_work+0x124/0x19c
     process_scheduled_works+0x18c/0x2d8
     worker_thread+0x144/0x280
     kthread+0x11c/0x128
     ret_from_fork+0x10/0x20
    
    Since RTC work accesses the ufshcd internal structures, it should be cancelled
    when ufshcd is removed. So do that in ufshcd_remove(), as per the order in
    ufshcd_init().
    
    Cc: stable@vger.kernel.org # 6.8
    Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
    Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
    Link: https://lore.kernel.org/r/20241111-ufs_bug_fix-v1-1-45ad8b62f02e@linaro.org
    Reviewed-by: Peter Wang <peter.wang@mediatek.com>
    Reviewed-by: Bean Huo <beanhuo@micron.com>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: ufs: core: Fix another deadlock during RTC update [+ + +]

Author: Peter Wang <peter.wang@mediatek.com>
Date:   Thu Oct 24 09:54:53 2024 +0800

    scsi: ufs: core: Fix another deadlock during RTC update
    
    commit cb7e509c4e0197f63717fee54fb41c4990ba8d3a upstream.
    
    If ufshcd_rtc_work calls ufshcd_rpm_put_sync() and the pm's usage_count
    is 0, we will enter the runtime suspend callback.  However, the runtime
    suspend callback will wait to flush ufshcd_rtc_work, causing a deadlock.
    
    Replace ufshcd_rpm_put_sync() with ufshcd_rpm_put() to avoid the
    deadlock.
    
    Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
    Cc: stable@vger.kernel.org #6.11.x
    Signed-off-by: Peter Wang <peter.wang@mediatek.com>
    Link: https://lore.kernel.org/r/20241024015453.21684-1-peter.wang@mediatek.com
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: ufs: core: Fix deadlock during RTC update [+ + +]

Author: Peter Wang <peter.wang@mediatek.com>
Date:   Mon Jul 15 14:38:31 2024 +0800

    scsi: ufs: core: Fix deadlock during RTC update
    
    commit 3911af778f208e5f49d43ce739332b91e26bc48e upstream.
    
    There is a deadlock when runtime suspend waits for the flush of RTC work,
    and the RTC work calls ufshcd_rpm_get_sync() to wait for runtime resume.
    
    Here is deadlock backtrace:
    
    kworker/0:1     D 4892.876354 10 10971 4859 0x4208060 0x8 10 0 120 670730152367
    ptr            f0ffff80c2e40000 0 1 0x00000001 0x000000ff 0x000000ff 0x000000ff
    <ffffffee5e71ddb0> __switch_to+0x1a8/0x2d4
    <ffffffee5e71e604> __schedule+0x684/0xa98
    <ffffffee5e71ea60> schedule+0x48/0xc8
    <ffffffee5e725f78> schedule_timeout+0x48/0x170
    <ffffffee5e71fb74> do_wait_for_common+0x108/0x1b0
    <ffffffee5e71efe0> wait_for_completion+0x44/0x60
    <ffffffee5d6de968> __flush_work+0x39c/0x424
    <ffffffee5d6decc0> __cancel_work_sync+0xd8/0x208
    <ffffffee5d6dee2c> cancel_delayed_work_sync+0x14/0x28
    <ffffffee5e2551b8> __ufshcd_wl_suspend+0x19c/0x480
    <ffffffee5e255fb8> ufshcd_wl_runtime_suspend+0x3c/0x1d4
    <ffffffee5dffd80c> scsi_runtime_suspend+0x78/0xc8
    <ffffffee5df93580> __rpm_callback+0x94/0x3e0
    <ffffffee5df90b0c> rpm_suspend+0x2d4/0x65c
    <ffffffee5df91448> __pm_runtime_suspend+0x80/0x114
    <ffffffee5dffd95c> scsi_runtime_idle+0x38/0x6c
    <ffffffee5df912f4> rpm_idle+0x264/0x338
    <ffffffee5df90f14> __pm_runtime_idle+0x80/0x110
    <ffffffee5e24ce44> ufshcd_rtc_work+0x128/0x1e4
    <ffffffee5d6e3a40> process_one_work+0x26c/0x650
    <ffffffee5d6e65c8> worker_thread+0x260/0x3d8
    <ffffffee5d6edec8> kthread+0x110/0x134
    <ffffffee5d616b18> ret_from_fork+0x10/0x20
    
    Skip updating RTC if RPM state is not RPM_ACTIVE.
    
    Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
    Cc: stable@vger.kernel.org # 6.9.x
    Signed-off-by: Peter Wang <peter.wang@mediatek.com>
    Link: https://lore.kernel.org/r/20240715063831.29792-1-peter.wang@mediatek.com
    Reviewed-by: Bean Huo <beanhuo@micron.com>
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

scsi: ufs: core: Fix ufshcd_is_ufs_dev_busy() and ufshcd_eh_timed_out() [+ + +]

Author: Bart Van Assche <bvanassche@acm.org>
Date:   Fri Feb 14 14:43:44 2025 -0800

    scsi: ufs: core: Fix ufshcd_is_ufs_dev_busy() and ufshcd_eh_timed_out()
    
    [ Upstream commit 4fa382be430421e1445f9c95c4dc9b7e0949ae8a ]
    
    ufshcd_is_ufs_dev_busy(), ufshcd_print_host_state() and
    ufshcd_eh_timed_out() are used in both modes (legacy mode and MCQ mode).
    hba->outstanding_reqs only represents the outstanding requests in legacy
    mode. Hence, change hba->outstanding_reqs into scsi_host_busy(hba->host) in
    these functions.
    
    Fixes: eacb139b77ff ("scsi: ufs: core: mcq: Enable multi-circular queue")
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20250214224352.3025151-1-bvanassche@acm.org
    Reviewed-by: Peter Wang <peter.wang@mediatek.com>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: ufs: core: Introduce ufshcd_has_pending_tasks() [+ + +]

Author: Avri Altman <avri.altman@wdc.com>
Date:   Sun Nov 24 09:08:05 2024 +0200

    scsi: ufs: core: Introduce ufshcd_has_pending_tasks()
    
    [ Upstream commit e738ba458e7539be1757dcdf85835a5c7b11fad4 ]
    
    Prepare to remove hba->clk_gating.active_reqs check from
    ufshcd_is_ufs_dev_busy().
    
    Signed-off-by: Avri Altman <avri.altman@wdc.com>
    Link: https://lore.kernel.org/r/20241124070808.194860-2-avri.altman@wdc.com
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Stable-dep-of: 4fa382be4304 ("scsi: ufs: core: Fix ufshcd_is_ufs_dev_busy() and ufshcd_eh_timed_out()")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: ufs: core: Prepare to introduce a new clock_gating lock [+ + +]

Author: Avri Altman <avri.altman@wdc.com>
Date:   Sun Nov 24 09:08:06 2024 +0200

    scsi: ufs: core: Prepare to introduce a new clock_gating lock
    
    [ Upstream commit 7869c6521f5715688b3d1f1c897374a68544eef0 ]
    
    Remove hba->clk_gating.active_reqs check from ufshcd_is_ufs_dev_busy()
    function to separate clock gating logic from general device busy checks.
    
    Signed-off-by: Avri Altman <avri.altman@wdc.com>
    Link: https://lore.kernel.org/r/20241124070808.194860-3-avri.altman@wdc.com
    Reviewed-by: Bart Van Assche <bvanassche@acm.org>
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Stable-dep-of: 4fa382be4304 ("scsi: ufs: core: Fix ufshcd_is_ufs_dev_busy() and ufshcd_eh_timed_out()")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

scsi: ufs: core: Start the RTC update work later [+ + +]

Author: Bart Van Assche <bvanassche@acm.org>
Date:   Thu Oct 31 14:26:24 2024 -0700

    scsi: ufs: core: Start the RTC update work later
    
    commit 54c814c8b23bc7617be3d46abdb896937695dbfa upstream.
    
    The RTC update work involves runtime resuming the UFS controller. Hence,
    only start the RTC update work after runtime power management in the UFS
    driver has been fully initialized. This patch fixes the following kernel
    crash:
    
    Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP
    Workqueue: events ufshcd_rtc_work
    Call trace:
     _raw_spin_lock_irqsave+0x34/0x8c (P)
     pm_runtime_get_if_active+0x24/0x9c (L)
     pm_runtime_get_if_active+0x24/0x9c
     ufshcd_rtc_work+0x138/0x1b4
     process_one_work+0x148/0x288
     worker_thread+0x2cc/0x3d4
     kthread+0x110/0x114
     ret_from_fork+0x10/0x20
    
    Reported-by: Neil Armstrong <neil.armstrong@linaro.org>
    Closes: https://lore.kernel.org/linux-scsi/0c0bc528-fdc2-4106-bc99-f23ae377f6f5@linaro.org/
    Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support")
    Cc: Bean Huo <beanhuo@micron.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: Bart Van Assche <bvanassche@acm.org>
    Link: https://lore.kernel.org/r/20241031212632.2799127-1-bvanassche@acm.org
    Reviewed-by: Peter Wang <peter.wang@mediatek.com>
    Reviewed-by: Bean Huo <beanhuo@micron.com>
    Tested-by: Neil Armstrong <neil.armstrong@linaro.org> # on SM8650-HDK
    Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

SUNRPC: convert RPC_TASK_* constants to enum [+ + +]

Author: Stephen Brennan <stephen.s.brennan@oracle.com>
Date:   Mon Aug 19 08:58:59 2024 -0700

    SUNRPC: convert RPC_TASK_* constants to enum
    
    [ Upstream commit 0b108e83795c9c23101f584ef7e3ab4f1f120ef0 ]
    
    The RPC_TASK_* constants are defined as macros, which means that most
    kernel builds will not contain their definitions in the debuginfo.
    However, it's quite useful for debuggers to be able to view the task
    state constant and interpret it correctly. Conversion to an enum will
    ensure the constants are present in debuginfo and can be interpreted by
    debuggers without needing to hard-code them and track their changes.
    
    Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com>
    Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
    Stable-dep-of: 5bbd6e863b15 ("SUNRPC: Prevent looping due to rpc_signal_task() races")
    Signed-off-by: Sasha Levin <sashal@kernel.org>

SUNRPC: Handle -ETIMEDOUT return from tlshd [+ + +]

Author: Benjamin Coddington <bcodding@redhat.com>
Date:   Tue Feb 11 12:31:57 2025 -0500

    SUNRPC: Handle -ETIMEDOUT return from tlshd
    
    [ Upstream commit 7a2f6f7687c5f7083a35317cddec5ad9fa491443 ]
    
    If the TLS handshake attempt returns -ETIMEDOUT, we currently translate
    that error into -EACCES.  This becomes problematic for cases where the RPC
    layer is attempting to re-connect in paths that don't resonably handle
    -EACCES, for example: writeback.  The RPC layer can handle -ETIMEDOUT quite
    well, however - so if the handshake returns this error let's just pass it
    along.
    
    Fixes: 75eb6af7acdf ("SUNRPC: Add a TCP-with-TLS RPC transport class")
    Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
    Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

SUNRPC: Prevent looping due to rpc_signal_task() races [+ + +]

Author: Trond Myklebust <trond.myklebust@hammerspace.com>
Date:   Sat Feb 1 15:00:02 2025 -0500

    SUNRPC: Prevent looping due to rpc_signal_task() races
    
    [ Upstream commit 5bbd6e863b15a85221e49b9bdb2d5d8f0bb91f3d ]
    
    If rpc_signal_task() is called while a task is in an rpc_call_done()
    callback function, and the latter calls rpc_restart_call(), the task can
    end up looping due to the RPC_TASK_SIGNALLED flag being set without the
    tk_rpc_status being set.
    Removing the redundant mechanism for signalling the task fixes the
    looping behaviour.
    
    Reported-by: Li Lingfeng <lilingfeng3@huawei.com>
    Fixes: 39494194f93b ("SUNRPC: Fix races with rpc_killall_tasks()")
    Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

sunrpc: suppress warnings for unused procfs functions [+ + +]

Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Feb 25 15:52:21 2025 +0100

    sunrpc: suppress warnings for unused procfs functions
    
    [ Upstream commit 1f7a4f98c11fbeb18ed21f3b3a497e90a50ad2e0 ]
    
    There is a warning about unused variables when building with W=1 and no procfs:
    
    net/sunrpc/cache.c:1660:30: error: 'cache_flush_proc_ops' defined but not used [-Werror=unused-const-variable=]
     1660 | static const struct proc_ops cache_flush_proc_ops = {
          |                              ^~~~~~~~~~~~~~~~~~~~
    net/sunrpc/cache.c:1622:30: error: 'content_proc_ops' defined but not used [-Werror=unused-const-variable=]
     1622 | static const struct proc_ops content_proc_ops = {
          |                              ^~~~~~~~~~~~~~~~
    net/sunrpc/cache.c:1598:30: error: 'cache_channel_proc_ops' defined but not used [-Werror=unused-const-variable=]
     1598 | static const struct proc_ops cache_channel_proc_ops = {
          |                              ^~~~~~~~~~~~~~~~~~~~~~
    
    These are used inside of an #ifdef, so replacing that with an
    IS_ENABLED() check lets the compiler see how they are used while
    still dropping them during dead code elimination.
    
    Fixes: dbf847ecb631 ("knfsd: allow cache_register to return error on failure")
    Reviewed-by: Jeff Layton <jlayton@kernel.org>
    Acked-by: Chuck Lever <chuck.lever@oracle.com>
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tcp: Defer ts_recent changes until req is owned [+ + +]

Author: Wang Hai <wanghai38@huawei.com>
Date:   Mon Feb 24 17:00:47 2025 +0800

    tcp: Defer ts_recent changes until req is owned
    
    [ Upstream commit 8d52da23b6c68a0f6bad83959ebb61a2cf623c4e ]
    
    Recently a bug was discovered where the server had entered TCP_ESTABLISHED
    state, but the upper layers were not notified.
    
    The same 5-tuple packet may be processed by different CPUSs, so two
    CPUs may receive different ack packets at the same time when the
    state is TCP_NEW_SYN_RECV.
    
    In that case, req->ts_recent in tcp_check_req may be changed concurrently,
    which will probably cause the newsk's ts_recent to be incorrectly large.
    So that tcp_validate_incoming will fail. At this point, newsk will not be
    able to enter the TCP_ESTABLISHED.
    
    cpu1                                    cpu2
    tcp_check_req
                                            tcp_check_req
     req->ts_recent = rcv_tsval = t1
                                             req->ts_recent = rcv_tsval = t2
    
     syn_recv_sock
      tcp_sk(child)->rx_opt.ts_recent = req->ts_recent = t2 // t1 < t2
    tcp_child_process
     tcp_rcv_state_process
      tcp_validate_incoming
       tcp_paws_check
        if ((s32)(rx_opt->ts_recent - rx_opt->rcv_tsval) <= paws_win)
            // t2 - t1 > paws_win, failed
                                            tcp_v4_do_rcv
                                             tcp_rcv_state_process
                                             // TCP_ESTABLISHED
    
    The cpu2's skb or a newly received skb will call tcp_v4_do_rcv to get
    the newsk into the TCP_ESTABLISHED state, but at this point it is no
    longer possible to notify the upper layer application. A notification
    mechanism could be added here, but the fix is more complex, so the
    current fix is used.
    
    In tcp_check_req, req->ts_recent is used to assign a value to
    tcp_sk(child)->rx_opt.ts_recent, so removing the change in req->ts_recent
    and changing tcp_sk(child)->rx_opt.ts_recent directly after owning the
    req fixes this bug.
    
    Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
    Signed-off-by: Wang Hai <wanghai38@huawei.com>
    Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
    Reviewed-by: Eric Dumazet <edumazet@google.com>
    Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

tracing: Fix bad hist from corrupting named_triggers list [+ + +]

Author: Steven Rostedt <rostedt@goodmis.org>
Date:   Thu Feb 27 16:39:44 2025 -0500

    tracing: Fix bad hist from corrupting named_triggers list
    
    commit 6f86bdeab633a56d5c6dccf1a2c5989b6a5e323e upstream.
    
    The following commands causes a crash:
    
     ~# cd /sys/kernel/tracing/events/rcu/rcu_callback
     ~# echo 'hist:name=bad:keys=common_pid:onmax(bogus).save(common_pid)' > trigger
     bash: echo: write error: Invalid argument
     ~# echo 'hist:name=bad:keys=common_pid' > trigger
    
    Because the following occurs:
    
    event_trigger_write() {
      trigger_process_regex() {
        event_hist_trigger_parse() {
    
          data = event_trigger_alloc(..);
    
          event_trigger_register(.., data) {
            cmd_ops->reg(.., data, ..) [hist_register_trigger()] {
              data->ops->init() [event_hist_trigger_init()] {
                save_named_trigger(name, data) {
                  list_add(&data->named_list, &named_triggers);
                }
              }
            }
          }
    
          ret = create_actions(); (return -EINVAL)
          if (ret)
            goto out_unreg;
    [..]
          ret = hist_trigger_enable(data, ...) {
            list_add_tail_rcu(&data->list, &file->triggers); <<<---- SKIPPED!!! (this is important!)
    [..]
     out_unreg:
          event_hist_unregister(.., data) {
            cmd_ops->unreg(.., data, ..) [hist_unregister_trigger()] {
              list_for_each_entry(iter, &file->triggers, list) {
                if (!hist_trigger_match(data, iter, named_data, false))   <- never matches
                    continue;
                [..]
                test = iter;
              }
              if (test && test->ops->free) <<<-- test is NULL
    
                test->ops->free(test) [event_hist_trigger_free()] {
                  [..]
                  if (data->name)
                    del_named_trigger(data) {
                      list_del(&data->named_list);  <<<<-- NEVER gets removed!
                    }
                  }
               }
             }
    
             [..]
             kfree(data); <<<-- frees item but it is still on list
    
    The next time a hist with name is registered, it causes an u-a-f bug and
    the kernel can crash.
    
    Move the code around such that if event_trigger_register() succeeds, the
    next thing called is hist_trigger_enable() which adds it to the list.
    
    A bunch of actions is called if get_named_trigger_data() returns false.
    But that doesn't need to be called after event_trigger_register(), so it
    can be moved up, allowing event_trigger_register() to be called just
    before hist_trigger_enable() keeping them together and allowing the
    file->triggers to be properly populated.
    
    Cc: stable@vger.kernel.org
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
    Link: https://lore.kernel.org/20250227163944.1c37f85f@gandalf.local.home
    Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
    Reported-by: Tomas Glozar <tglozar@redhat.com>
    Tested-by: Tomas Glozar <tglozar@redhat.com>
    Reviewed-by: Tom Zanussi <zanussi@kernel.org>
    Closes: https://lore.kernel.org/all/CAP4=nvTsxjckSBTz=Oe_UYh8keD9_sZC4i++4h72mJLic4_W4A@mail.gmail.com/
    Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

uprobes: Reject the shared zeropage in uprobe_write_opcode() [+ + +]

Author: Tong Tiangen <tongtiangen@huawei.com>
Date:   Mon Feb 24 11:11:49 2025 +0800

    uprobes: Reject the shared zeropage in uprobe_write_opcode()
    
    [ Upstream commit bddf10d26e6e5114e7415a0e442ec6f51a559468 ]
    
    We triggered the following crash in syzkaller tests:
    
      BUG: Bad page state in process syz.7.38  pfn:1eff3
      page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1eff3
      flags: 0x3fffff00004004(referenced|reserved|node=0|zone=1|lastcpupid=0x1fffff)
      raw: 003fffff00004004 ffffe6c6c07bfcc8 ffffe6c6c07bfcc8 0000000000000000
      raw: 0000000000000000 0000000000000000 00000000fffffffe 0000000000000000
      page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x32/0x50
       bad_page+0x69/0xf0
       free_unref_page_prepare+0x401/0x500
       free_unref_page+0x6d/0x1b0
       uprobe_write_opcode+0x460/0x8e0
       install_breakpoint.part.0+0x51/0x80
       register_for_each_vma+0x1d9/0x2b0
       __uprobe_register+0x245/0x300
       bpf_uprobe_multi_link_attach+0x29b/0x4f0
       link_create+0x1e2/0x280
       __sys_bpf+0x75f/0xac0
       __x64_sys_bpf+0x1a/0x30
       do_syscall_64+0x56/0x100
       entry_SYSCALL_64_after_hwframe+0x78/0xe2
    
       BUG: Bad rss-counter state mm:00000000452453e0 type:MM_FILEPAGES val:-1
    
    The following syzkaller test case can be used to reproduce:
    
      r2 = creat(&(0x7f0000000000)='./file0\x00', 0x8)
      write$nbd(r2, &(0x7f0000000580)=ANY=[], 0x10)
      r4 = openat(0xffffffffffffff9c, &(0x7f0000000040)='./file0\x00', 0x42, 0x0)
      mmap$IORING_OFF_SQ_RING(&(0x7f0000ffd000/0x3000)=nil, 0x3000, 0x0, 0x12, r4, 0x0)
      r5 = userfaultfd(0x80801)
      ioctl$UFFDIO_API(r5, 0xc018aa3f, &(0x7f0000000040)={0xaa, 0x20})
      r6 = userfaultfd(0x80801)
      ioctl$UFFDIO_API(r6, 0xc018aa3f, &(0x7f0000000140))
      ioctl$UFFDIO_REGISTER(r6, 0xc020aa00, &(0x7f0000000100)={{&(0x7f0000ffc000/0x4000)=nil, 0x4000}, 0x2})
      ioctl$UFFDIO_ZEROPAGE(r5, 0xc020aa04, &(0x7f0000000000)={{&(0x7f0000ffd000/0x1000)=nil, 0x1000}})
      r7 = bpf$PROG_LOAD(0x5, &(0x7f0000000140)={0x2, 0x3, &(0x7f0000000200)=ANY=[@ANYBLOB="1800000000120000000000000000000095"], &(0x7f0000000000)='GPL\x00', 0x7, 0x0, 0x0, 0x0, 0x0, '\x00', 0x0, @fallback=0x30, 0xffffffffffffffff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x10, 0x0, @void, @value}, 0x94)
      bpf$BPF_LINK_CREATE_XDP(0x1c, &(0x7f0000000040)={r7, 0x0, 0x30, 0x1e, @val=@uprobe_multi={&(0x7f0000000080)='./file0\x00', &(0x7f0000000100)=[0x2], 0x0, 0x0, 0x1}}, 0x40)
    
    The cause is that zero pfn is set to the PTE without increasing the RSS
    count in mfill_atomic_pte_zeropage() and the refcount of zero folio does
    not increase accordingly. Then, the operation on the same pfn is performed
    in uprobe_write_opcode()->__replace_page() to unconditional decrease the
    RSS count and old_folio's refcount.
    
    Therefore, two bugs are introduced:
    
     1. The RSS count is incorrect, when process exit, the check_mm() report
        error "Bad rss-count".
    
     2. The reserved folio (zero folio) is freed when folio->refcount is zero,
        then free_pages_prepare->free_page_is_bad() report error
        "Bad page state".
    
    There is more, the following warning could also theoretically be triggered:
    
      __replace_page()
        -> ...
          -> folio_remove_rmap_pte()
            -> VM_WARN_ON_FOLIO(is_zero_folio(folio), folio)
    
    Considering that uprobe hit on the zero folio is a very rare case, just
    reject zero old folio immediately after get_user_page_vma_remote().
    
    [ mingo: Cleaned up the changelog ]
    
    Fixes: 7396fa818d62 ("uprobes/core: Make background page replacement logic account for rss_stat counters")
    Fixes: 2b1444983508 ("uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints")
    Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Oleg Nesterov <oleg@redhat.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Masami Hiramatsu <mhiramat@kernel.org>
    Link: https://lore.kernel.org/r/20250224031149.1598949-1-tongtiangen@huawei.com
    Signed-off-by: Sasha Levin <sashal@kernel.org>

usbnet: gl620a: fix endpoint checking in genelink_bind() [+ + +]

Author: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Date:   Mon Feb 24 20:29:17 2025 +0300

    usbnet: gl620a: fix endpoint checking in genelink_bind()
    
    commit 1cf9631d836b289bd5490776551961c883ae8a4f upstream.
    
    Syzbot reports [1] a warning in usb_submit_urb() triggered by
    inconsistencies between expected and actually present endpoints
    in gl620a driver. Since genelink_bind() does not properly
    verify whether specified eps are in fact provided by the device,
    in this case, an artificially manufactured one, one may get a
    mismatch.
    
    Fix the issue by resorting to a usbnet utility function
    usbnet_get_endpoints(), usually reserved for this very problem.
    Check for endpoints and return early before proceeding further if
    any are missing.
    
    [1] Syzbot report:
    usb 5-1: Manufacturer: syz
    usb 5-1: SerialNumber: syz
    usb 5-1: config 0 descriptor??
    gl620a 5-1:0.23 usb0: register 'gl620a' at usb-dummy_hcd.0-1, ...
    ------------[ cut here ]------------
    usb 5-1: BOGUS urb xfer, pipe 3 != type 1
    WARNING: CPU: 2 PID: 1841 at drivers/usb/core/urb.c:503 usb_submit_urb+0xe4b/0x1730 drivers/usb/core/urb.c:503
    Modules linked in:
    CPU: 2 UID: 0 PID: 1841 Comm: kworker/2:2 Not tainted 6.12.0-syzkaller-07834-g06afb0f36106 #0
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
    Workqueue: mld mld_ifc_work
    RIP: 0010:usb_submit_urb+0xe4b/0x1730 drivers/usb/core/urb.c:503
    ...
    Call Trace:
     <TASK>
     usbnet_start_xmit+0x6be/0x2780 drivers/net/usb/usbnet.c:1467
     __netdev_start_xmit include/linux/netdevice.h:5002 [inline]
     netdev_start_xmit include/linux/netdevice.h:5011 [inline]
     xmit_one net/core/dev.c:3590 [inline]
     dev_hard_start_xmit+0x9a/0x7b0 net/core/dev.c:3606
     sch_direct_xmit+0x1ae/0xc30 net/sched/sch_generic.c:343
     __dev_xmit_skb net/core/dev.c:3827 [inline]
     __dev_queue_xmit+0x13d4/0x43e0 net/core/dev.c:4400
     dev_queue_xmit include/linux/netdevice.h:3168 [inline]
     neigh_resolve_output net/core/neighbour.c:1514 [inline]
     neigh_resolve_output+0x5bc/0x950 net/core/neighbour.c:1494
     neigh_output include/net/neighbour.h:539 [inline]
     ip6_finish_output2+0xb1b/0x2070 net/ipv6/ip6_output.c:141
     __ip6_finish_output net/ipv6/ip6_output.c:215 [inline]
     ip6_finish_output+0x3f9/0x1360 net/ipv6/ip6_output.c:226
     NF_HOOK_COND include/linux/netfilter.h:303 [inline]
     ip6_output+0x1f8/0x540 net/ipv6/ip6_output.c:247
     dst_output include/net/dst.h:450 [inline]
     NF_HOOK include/linux/netfilter.h:314 [inline]
     NF_HOOK include/linux/netfilter.h:308 [inline]
     mld_sendpack+0x9f0/0x11d0 net/ipv6/mcast.c:1819
     mld_send_cr net/ipv6/mcast.c:2120 [inline]
     mld_ifc_work+0x740/0xca0 net/ipv6/mcast.c:2651
     process_one_work+0x9c5/0x1ba0 kernel/workqueue.c:3229
     process_scheduled_works kernel/workqueue.c:3310 [inline]
     worker_thread+0x6c8/0xf00 kernel/workqueue.c:3391
     kthread+0x2c1/0x3a0 kernel/kthread.c:389
     ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147
     ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
     </TASK>
    
    Reported-by: syzbot+d693c07c6f647e0388d3@syzkaller.appspotmail.com
    Closes: https://syzkaller.appspot.com/bug?extid=d693c07c6f647e0388d3
    Fixes: 47ee3051c856 ("[PATCH] USB: usbnet (5/9) module for genesys gl620a cables")
    Cc: stable@vger.kernel.org
    Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
    Link: https://patch.msgid.link/20250224172919.1220522-1-n.zhandarovich@fintech.ru
    Signed-off-by: Paolo Abeni <pabeni@redhat.com>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

vmlinux.lds: Ensure that const vars with relocations are mapped R/O [+ + +]

Author: Ard Biesheuvel <ardb@kernel.org>
Date:   Fri Feb 21 14:57:06 2025 +0100

    vmlinux.lds: Ensure that const vars with relocations are mapped R/O
    
    commit 68f3ea7ee199ef77551e090dfef5a49046ea8443 upstream.
    
    In the kernel, there are architectures (x86, arm64) that perform
    boot-time relocation (for KASLR) without relying on PIE codegen. In this
    case, all const global objects are emitted into .rodata, including const
    objects with fields that will be fixed up by the boot-time relocation
    code.  This implies that .rodata (and .text in some cases) need to be
    writable at boot, but they will usually be mapped read-only as soon as
    the boot completes.
    
    When using PIE codegen, the compiler will emit const global objects into
    .data.rel.ro rather than .rodata if the object contains fields that need
    such fixups at boot-time. This permits the linker to annotate such
    regions as requiring read-write access only at load time, but not at
    execution time (in user space), while keeping .rodata truly const (in
    user space, this is important for reducing the CoW footprint of dynamic
    executables).
    
    This distinction does not matter for the kernel, but it does imply that
    const data will end up in writable memory if the .data.rel.ro sections
    are not treated in a special way, as they will end up in the writable
    .data segment by default.
    
    So emit .data.rel.ro into the .rodata segment.
    
    Cc: stable@vger.kernel.org
    Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
    Link: https://lore.kernel.org/r/20250221135704.431269-5-ardb+git@google.com
    Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/apic: Provide apic_force_nmi_on_cpu() [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 14:00:07 2023 +0200

    x86/apic: Provide apic_force_nmi_on_cpu()
    
    commit 9cab5fb776d4367e26950cf759211e948335288e upstream
    
    When SMT siblings are soft-offlined and parked in one of the play_dead()
    variants they still react on NMI, which is problematic on affected Intel
    CPUs. The default play_dead() variant uses MWAIT on modern CPUs, which is
    not guaranteed to be safe when updated concurrently.
    
    Right now late loading is prevented when not all SMT siblings are online,
    but as they still react on NMI, it is possible to bring them out of their
    park position into a trivial rendezvous handler.
    
    Provide a function which allows to do that. I does sanity checks whether
    the target is in the cpus_booted_once_mask and whether the APIC driver
    supports it.
    
    Mark X2APIC and XAPIC as capable, but exclude 32bit and the UV and NUMACHIP
    variants as that needs feedback from the relevant experts.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115903.603100036@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems [+ + +]

Author: Russell Senior <russell@personaltelco.net>
Date:   Tue Feb 25 22:31:20 2025 +0100

    x86/CPU: Fix warm boot hang regression on AMD SC1100 SoC systems
    
    [ Upstream commit bebe35bb738b573c32a5033499cd59f20293f2a3 ]
    
    I still have some Soekris net4826 in a Community Wireless Network I
    volunteer with. These devices use an AMD SC1100 SoC. I am running
    OpenWrt on them, which uses a patched kernel, that naturally has
    evolved over time.  I haven't updated the ones in the field in a
    number of years (circa 2017), but have one in a test bed, where I have
    intermittently tried out test builds.
    
    A few years ago, I noticed some trouble, particularly when "warm
    booting", that is, doing a reboot without removing power, and noticed
    the device was hanging after the kernel message:
    
      [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.
    
    If I removed power and then restarted, it would boot fine, continuing
    through the message above, thusly:
    
      [    0.081615] Working around Cyrix MediaGX virtual DMA bugs.
      [    0.090076] Enable Memory-Write-back mode on Cyrix/NSC processor.
      [    0.100000] Enable Memory access reorder on Cyrix/NSC processor.
      [    0.100070] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
      [    0.110058] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0, 1GB 0
      [    0.120037] CPU: NSC Geode(TM) Integrated Processor by National Semi (family: 0x5, model: 0x9, stepping: 0x1)
      [...]
    
    In order to continue using modern tools, like ssh, to interact with
    the software on these old devices, I need modern builds of the OpenWrt
    firmware on the devices. I confirmed that the warm boot hang was still
    an issue in modern OpenWrt builds (currently using a patched linux
    v6.6.65).
    
    Last night, I decided it was time to get to the bottom of the warm
    boot hang, and began bisecting. From preserved builds, I narrowed down
    the bisection window from late February to late May 2019. During this
    period, the OpenWrt builds were using 4.14.x. I was able to build
    using period-correct Ubuntu 18.04.6. After a number of bisection
    iterations, I identified a kernel bump from 4.14.112 to 4.14.113 as
    the commit that introduced the warm boot hang.
    
      https://github.com/openwrt/openwrt/commit/07aaa7e3d62ad32767d7067107db64b6ade81537
    
    Looking at the upstream changes in the stable kernel between 4.14.112
    and 4.14.113 (tig v4.14.112..v4.14.113), I spotted a likely suspect:
    
      https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=20afb90f730982882e65b01fb8bdfe83914339c5
    
    So, I tried reverting just that kernel change on top of the breaking
    OpenWrt commit, and my warm boot hang went away.
    
    Presumably, the warm boot hang is due to some register not getting
    cleared in the same way that a loss of power does. That is
    approximately as much as I understand about the problem.
    
    More poking/prodding and coaching from Jonas Gorski, it looks
    like this test patch fixes the problem on my board: Tested against
    v6.6.67 and v4.14.113.
    
    Fixes: 18fb053f9b82 ("x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors")
    Debugged-by: Jonas Gorski <jonas.gorski@gmail.com>
    Signed-off-by: Russell Senior <russell@personaltelco.net>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>
    Link: https://lore.kernel.org/r/CAHP3WfOgs3Ms4Z+L9i0-iBOE21sdMk5erAiJurPjnrL9LSsgRA@mail.gmail.com
    Cc: Matthew Whitehead <tedheadster@gmail.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Sasha Levin <sashal@kernel.org>

x86/microcode/32: Move early loading after paging enable [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Oct 17 23:23:32 2023 +0200

    x86/microcode/32: Move early loading after paging enable
    
    commit 0b62f6cb07738d7211d926c39f6946b87f72e792 upstream.
    
    32-bit loads microcode before paging is enabled. The commit which
    introduced that has zero justification in the changelog. The cover
    letter has slightly more content, but it does not give any technical
    justification either:
    
      "The problem in current microcode loading method is that we load a
       microcode way, way too late; ideally we should load it before turning
       paging on.  This may only be practical on 32 bits since we can't get
       to 64-bit mode without paging on, but we should still do it as early
       as at all possible."
    
    Handwaving word salad with zero technical content.
    
    Someone claimed in an offlist conversation that this is required for
    curing the ATOM erratum AAE44/AAF40/AAG38/AAH41. That erratum requires
    an microcode update in order to make the usage of PSE safe. But during
    early boot, PSE is completely irrelevant and it is evaluated way later.
    
    Neither is it relevant for the AP on single core HT enabled CPUs as the
    microcode loading on the AP is not doing anything.
    
    On dual core CPUs there is a theoretical problem if a split of an
    executable large page between enabling paging including PSE and loading
    the microcode happens. But that's only theoretical, it's practically
    irrelevant because the affected dual core CPUs are 64bit enabled and
    therefore have paging and PSE enabled before loading the microcode on
    the second core. So why would it work on 64-bit but not on 32-bit?
    
    The erratum:
    
      "AAG38 Code Fetch May Occur to Incorrect Address After a Large Page is
       Split Into 4-Kbyte Pages
    
       Problem: If software clears the PS (page size) bit in a present PDE
       (page directory entry), that will cause linear addresses mapped through
       this PDE to use 4-KByte pages instead of using a large page after old
       TLB entries are invalidated. Due to this erratum, if a code fetch uses
       this PDE before the TLB entry for the large page is invalidated then it
       may fetch from a different physical address than specified by either the
       old large page translation or the new 4-KByte page translation. This
       erratum may also cause speculative code fetches from incorrect addresses."
    
    The practical relevance for this is exactly zero because there is no
    splitting of large text pages during early boot-time, i.e. between paging
    enable and microcode loading, and neither during CPU hotplug.
    
    IOW, this load microcode before paging enable is yet another voodoo
    programming solution in search of a problem. What's worse is that it causes
    at least two serious problems:
    
     1) When stackprotector is enabled, the microcode loader code has the
        stackprotector mechanics enabled. The read from the per CPU variable
        __stack_chk_guard is always accessing the virtual address either
        directly on UP or via %fs on SMP. In physical address mode this
        results in an access to memory above 3GB. So this works by chance as
        the hardware returns the same value when there is no RAM at this
        physical address. When there is RAM populated above 3G then the read
        is by chance the same as nothing changes that memory during the very
        early boot stage. That's not necessarily true during runtime CPU
        hotplug.
    
     2) When function tracing is enabled, the relevant microcode loader
        functions and the functions invoked from there will call into the
        tracing code and evaluate global and per CPU variables in physical
        address mode. What could potentially go wrong?
    
    Cure this and move the microcode loading after the early paging enable, use
    the new temporary initrd mapping and remove the gunk in the microcode
    loader which is required to handle physical address mode.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231017211722.348298216@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/AMD: Add get_patch_level() [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Thu Jan 23 13:02:32 2025 +0100

    x86/microcode/AMD: Add get_patch_level()
    
    commit 037e81fb9d2dfe7b31fd97e5f578854e38f09887 upstream
    
    Put the MSR_AMD64_PATCH_LEVEL reading of the current microcode revision
    the hw has, into a separate function.
    
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20250211163648.30531-6-bp@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/amd: Cache builtin microcode too [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Oct 10 17:08:43 2023 +0200

    x86/microcode/amd: Cache builtin microcode too
    
    commit d419d28261e72e1c9ec418711b3da41df2265139 upstream
    
    save_microcode_in_initrd_amd() fails to cache builtin microcode and only
    scans initrd.
    
    Use find_blobs_in_containers() instead which covers both.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231010150702.495139089@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/amd: Cache builtin/initrd microcode early [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Oct 17 23:23:53 2023 +0200

    x86/microcode/amd: Cache builtin/initrd microcode early
    
    commit a7939f01672034a58ad3fdbce69bb6c665ce0024 upstream
    
    There is no reason to scan builtin/initrd microcode on each AP.
    
    Cache the builtin/initrd microcode in an early initcall so that the
    early AP loader can utilize the cache.
    
    The existing fs initcall which invoked save_microcode_in_initrd_amd() is
    still required to maintain the initrd_gone flag. Rename it accordingly.
    This will be removed once the AP loader code is converted to use the
    cache.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231017211723.187566507@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/AMD: Fix a -Wsometimes-uninitialized clang false positive [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Tue Jul 30 09:52:43 2024 +0200

    x86/microcode/AMD: Fix a -Wsometimes-uninitialized clang false positive
    
    commit 5343558a868e7e635b40baa2e46bf53df1a2d131 upstream.
    
    Initialize equiv_id in order to shut up:
    
      arch/x86/kernel/cpu/microcode/amd.c:714:6: warning: variable 'equiv_id' is \
      used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]
            if (x86_family(bsp_cpuid_1_eax) < 0x17) {
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    
    because clang doesn't do interprocedural analysis for warnings to see
    that this variable won't be used uninitialized.
    
    Fixes: 94838d230a6c ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
    Reported-by: kernel test robot <lkp@intel.com>
    Closes: https://lore.kernel.org/oe-kbuild-all/202407291815.gJBST0P3-lkp@intel.com/
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/AMD: Flush patch buffer mapping after application [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Tue Nov 19 12:21:33 2024 +0100

    x86/microcode/AMD: Flush patch buffer mapping after application
    
    commit c809b0d0e52d01c30066367b2952c4c4186b1047 upstream
    
    Due to specific requirements while applying microcode patches on Zen1
    and 2, the patch buffer mapping needs to be flushed from the TLB after
    application. Do so.
    
    If not, unnecessary and unnatural delays happen in the boot process.
    
    Reported-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Tested-by: Thomas De Schampheleire <thomas.de_schampheleire@nokia.com>
    Cc: <stable@kernel.org> # f1d84b59cbb9 ("x86/mm: Carve out INVLPG inline asm for use by others")
    Link: https://lore.kernel.org/r/ZyulbYuvrkshfsd2@antipodes
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/AMD: Get rid of the _load_microcode_amd() forward declaration [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Thu Jan 23 12:51:37 2025 +0100

    x86/microcode/AMD: Get rid of the _load_microcode_amd() forward declaration
    
    commit b39c387164879eef71886fc93cee5ca7dd7bf500 upstream
    
    Simply move save_microcode_in_initrd() down.
    
    No functional changes.
    
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20250211163648.30531-5-bp@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/AMD: Have __apply_microcode_amd() return bool [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Mon Nov 18 17:17:24 2024 +0100

    x86/microcode/AMD: Have __apply_microcode_amd() return bool
    
    commit 78e0aadbd4c6807a06a9d25bc190fe515d3f3c42 upstream
    
    This is the natural thing to do anyway.
    
    No functional changes.
    
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/AMD: Load only SHA256-checksummed patches [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Thu Jan 23 14:44:53 2025 +0100

    x86/microcode/AMD: Load only SHA256-checksummed patches
    
    commit 50cef76d5cb0e199cda19f026842560f6eedc4f7 upstream
    
    Load patches for which the driver carries a SHA256 checksum of the patch
    blob.
    
    This can be disabled by adding "microcode.amd_sha_check=off" on the
    kernel cmdline. But it is highly NOT recommended.
    
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/AMD: Make __verify_patch_size() return bool [+ + +]

Author: Nikolay Borisov <nik.borisov@suse.com>
Date:   Fri Oct 18 18:51:50 2024 +0300

    x86/microcode/AMD: Make __verify_patch_size() return bool
    
    commit d8317f3d8e6b412ff51ea66f1de2b2f89835f811 upstream
    
    The result of that function is in essence boolean, so simplify to return the
    result of the relevant expression. It also makes it follow the convention used
    by __verify_patch_section().
    
    No functional changes.
    
    Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20241018155151.702350-3-nik.borisov@suse.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/AMD: Merge early_apply_microcode() into its single callsite [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Thu Jan 23 12:46:45 2025 +0100

    x86/microcode/AMD: Merge early_apply_microcode() into its single callsite
    
    commit dc15675074dcfd79a2f10a6e39f96b0244961a01 upstream
    
    No functional changes.
    
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20250211163648.30531-4-bp@kernel.org
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/AMD: Pay attention to the stepping dynamically [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Mon Oct 21 10:27:52 2024 +0200

    x86/microcode/AMD: Pay attention to the stepping dynamically
    
    commit d1744a4c975b1acbe8b498356d28afbc46c88428 upstream
    
    Commit in Fixes changed how a microcode patch is loaded on Zen and newer but
    the patch matching needs to happen with different rigidity, depending on what
    is being done:
    
    1) When the patch is added to the patches cache, the stepping must be ignored
       because the driver still supports different steppings per system
    
    2) When the patch is matched for loading, then the stepping must be taken into
       account because each CPU needs the patch matching its exact stepping
    
    Take care of that by making the matching smarter.
    
    Fixes: 94838d230a6c ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
    Reported-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Tested-by: Jens Axboe <axboe@kernel.dk>
    Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/AMD: Return bool from find_blobs_in_containers() [+ + +]

Author: Nikolay Borisov <nik.borisov@suse.com>
Date:   Fri Oct 18 18:51:49 2024 +0300

    x86/microcode/AMD: Return bool from find_blobs_in_containers()
    
    commit a85c08aaa665b5436d325f6d7138732a0e1315ce upstream
    
    Instead of open-coding the check for size/data move it inside the
    function and make it return a boolean indicating whether data was found
    or not.
    
    No functional changes.
    
      [ bp: Write @ret in find_blobs_in_containers() only on success. ]
    
    Signed-off-by: Nikolay Borisov <nik.borisov@suse.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20241018155151.702350-2-nik.borisov@suse.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/AMD: Split load_microcode_amd() [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Mon Oct 21 10:38:21 2024 +0200

    x86/microcode/AMD: Split load_microcode_amd()
    
    commit 1d81d85d1a19e50d5237dc67d6b825c34ae13de8 upstream
    
    This function should've been split a long time ago because it is used in
    two paths:
    
    1) On the late loading path, when the microcode is loaded through the
       request_firmware interface
    
    2) In the save_microcode_in_initrd() path which collects all the
       microcode patches which are relevant for the current system before
       the initrd with the microcode container has been jettisoned.
    
       In that path, it is not really necessary to iterate over the nodes on
       a system and match a patch however it didn't cause any trouble so it
       was left for a later cleanup
    
    However, that later cleanup was expedited by the fact that Jens was
    enabling "Use L3 as a NUMA node" in the BIOS setting in his machine and
    so this causes the NUMA CPU masks used in cpumask_of_node() to be
    generated *after* 2) above happened on the first node. Which means, all
    those masks were funky, wrong, uninitialized and whatnot, leading to
    explosions when dereffing c->microcode in load_microcode_amd().
    
    So split that function and do only the necessary work needed at each
    stage.
    
    Fixes: 94838d230a6c ("x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID")
    Reported-by: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Tested-by: Jens Axboe <axboe@kernel.dk>
    Link: https://lore.kernel.org/r/91194406-3fdf-4e38-9838-d334af538f74@kernel.dk
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/amd: Use cached microcode for AP load [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Oct 17 23:23:55 2023 +0200

    x86/microcode/amd: Use cached microcode for AP load
    
    commit 5af05b8d51a8e3ff5905663655c0f46d1aaae44a upstream
    
    Now that the microcode cache is initialized before the APs are brought
    up, there is no point in scanning builtin/initrd microcode during AP
    loading.
    
    Convert the AP loader to utilize the cache, which in turn makes the CPU
    hotplug callback which applies the microcode after initrd/builtin is
    gone, obsolete as the early loading during late hotplug operations
    including the resume path depends now only on the cache.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231017211723.243426023@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/amd: Use correct per CPU ucode_cpu_info [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Oct 10 17:08:41 2023 +0200

    x86/microcode/amd: Use correct per CPU ucode_cpu_info
    
    commit ecfd41089348fa4cc767dc588367e9fdf8cb6b9d upstream
    
    find_blobs_in_containers() is invoked on every CPU but overwrites
    unconditionally ucode_cpu_info of CPU0.
    
    Fix this by using the proper CPU data and move the assignment into the
    call site apply_ucode_from_containers() so that the function can be
    reused.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231010150702.433454320@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID [+ + +]

Author: Borislav Petkov <bp@alien8.de>
Date:   Thu Jul 25 13:20:37 2024 +0200

    x86/microcode/AMD: Use the family,model,stepping encoded in the patch ID
    
    commit 94838d230a6c835ced1bad06b8759e0a5f19c1d3 upstream
    
    On Zen and newer, the family, model and stepping is part of the
    microcode patch ID so that the equivalence table the driver has been
    using, is not needed anymore.
    
    So switch the driver to use that from now on.
    
    The equivalence table in the microcode blob should still remain in case
    there's need to pass some additional information to the kernel loader.
    
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20240725112037.GBZqI1BbUk1KMlOJ_D@fat_crate.local
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Cleanup code further [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 13:59:41 2023 +0200

    x86/microcode/intel: Cleanup code further
    
    commit 0177669ee61de4dc641f9ad86a3df6f22327cf6c upstream
    
    Sanitize the microcode scan loop, fixup printks and move the loading
    function for builtin microcode next to the place where it is used and mark
    it __init.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115902.389400871@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Remove unnecessary cache writeback and invalidation [+ + +]

Author: Chang S. Bae <chang.seok.bae@intel.com>
Date:   Tue Oct 1 09:10:36 2024 -0700

    x86/microcode/intel: Remove unnecessary cache writeback and invalidation
    
    commit 9a819753b0209c6edebdea447a1aa53e8c697653 upstream
    
    Currently, an unconditional cache flush is performed during every
    microcode update. Although the original changelog did not mention
    a specific erratum, this measure was primarily intended to address
    a specific microcode bug, the load of which has already been blocked by
    is_blacklisted(). Therefore, this cache flush is no longer necessary.
    
    Additionally, the side effects of doing this have been overlooked. It
    increases CPU rendezvous time during late loading, where the cache flush
    takes between 1x to 3.5x longer than the actual microcode update.
    
    Remove native_wbinvd() and update the erratum name to align with the
    latest errata documentation, document ID 334163 Version 022US.
    
      [ bp: Zap the flaky documentation URL. ]
    
    Fixes: 91df9fdf5149 ("x86/microcode/intel: Writeback and invalidate caches before updating microcode")
    Reported-by: Yan Hua Wu <yanhua1.wu@intel.com>
    Reported-by: William Xie <william.xie@intel.com>
    Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Acked-by: Ashok Raj <ashok.raj@intel.com>
    Tested-by: Yan Hua Wu <yanhua1.wu@intel.com>
    Link: https://lore.kernel.org/r/20241001161042.465584-2-chang.seok.bae@intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Reuse intel_cpu_collect_info() [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 13:59:49 2023 +0200

    x86/microcode/intel: Reuse intel_cpu_collect_info()
    
    commit 11f96ac4c21e701650c7d8349b252973185ac6ce upstream
    
    No point for an almost duplicate function.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115902.741173606@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Rework intel_cpu_collect_info() [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Oct 17 23:23:45 2023 +0200

    x86/microcode/intel: Rework intel_cpu_collect_info()
    
    commit 164aa1ca537238c46923ccacd8995b4265aee47b upstream
    
    Nothing needs struct ucode_cpu_info. Make it take struct cpu_signature,
    let it return a boolean and simplify the implementation. Rename it now
    that the silly name clash with collect_cpu_info() is gone.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231017211722.851573238@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Rework intel_find_matching_signature() [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 13:59:50 2023 +0200

    x86/microcode/intel: Rework intel_find_matching_signature()
    
    commit b7fcd995b261c9976e05f47554529c98a0f1cbb0 upstream
    
    Take a cpu_signature argument and work from there. Move the match()
    helper next to the callsite as there is no point for having it in
    a header.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115902.797820205@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Rip out mixed stepping support for Intel CPUs [+ + +]

Author: Ashok Raj <ashok.raj@intel.com>
Date:   Tue Oct 17 23:23:33 2023 +0200

    x86/microcode/intel: Rip out mixed stepping support for Intel CPUs
    
    commit ae76d951f6537001bdf77894d19cd4a446de337e upstream
    
    Mixed steppings aren't supported on Intel CPUs. Only one microcode patch
    is required for the entire system. The caching of microcode blobs which
    match the family and model is therefore pointless and in fact is
    dysfunctional as CPU hotplug updates use only a single microcode blob,
    i.e. the one where *intel_ucode_patch points to.
    
    Remove the microcode cache and make it an AMD local feature.
    
      [ tglx:
         - save only at the end. Otherwise random microcode ends up in the
              pointer for early loading
         - free the ucode patch pointer in save_microcode_patch() only
        after kmemdup() has succeeded, as reported by Andrew Cooper ]
    
    Originally-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ashok Raj <ashok.raj@intel.com>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231017211722.404362809@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Save the microcode only after a successful late-load [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 13:59:44 2023 +0200

    x86/microcode/intel: Save the microcode only after a successful late-load
    
    commit 2a1dada3d1cf8f80a27663653a371d99dbf5d540 upstream
    
    There are situations where the late microcode is loaded into memory but
    is not applied:
    
      1) The rendezvous fails
      2) The microcode is rejected by the CPUs
    
    If any of this happens then the pointer which was updated at firmware
    load time is stale and subsequent CPU hotplug operations either fail to
    update or create inconsistent microcode state.
    
    Save the loaded microcode in a separate pointer before the late load is
    attempted and when successful, update the hotplug pointer accordingly
    via a new microcode_ops callback.
    
    Remove the pointless fallback in the loader to a microcode pointer which
    is never populated.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115902.505491309@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Set new revision only after a successful update [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Fri Dec 1 14:35:06 2023 +0100

    x86/microcode/intel: Set new revision only after a successful update
    
    commit 9c21ea53e6bd1104c637b80a0688040f184cc761 upstream
    
    This was meant to be done only when early microcode got updated
    successfully. Move it into the if-branch.
    
    Also, make sure the current revision is read unconditionally and only
    once.
    
    Fixes: 080990aa3344 ("x86/microcode: Rework early revisions reporting")
    Reported-by: Ashok Raj <ashok.raj@intel.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Tested-by: Ashok Raj <ashok.raj@intel.com>
    Link: https://lore.kernel.org/r/ZWjVt5dNRjbcvlzR@a4bf019067fa.jf.intel.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Simplify and rename generic_load_microcode() [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 13:59:40 2023 +0200

    x86/microcode/intel: Simplify and rename generic_load_microcode()
    
    commit 6b072022ab2e1e83b7588144ee0080f7197b71da upstream
    
    so it becomes less obfuscated and rename it because there is nothing
    generic about it.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115902.330295409@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Simplify early loading [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 13:59:43 2023 +0200

    x86/microcode/intel: Simplify early loading
    
    commit dd5e3e3ca6ac011582a9f3f987493bf6741568c0 upstream.
    
    The early loading code is overly complicated:
    
      - It scans the builtin/initrd for microcode not only on the BSP, but also
        on all APs during early boot and then later in the boot process it
        scans again to duplicate and save the microcode before initrd goes
        away.
    
        That's a pointless exercise because this can be simply done before
        bringing up the APs when the memory allocator is up and running.
    
     - Saving the microcode from within the scan loop is completely
       non-obvious and a left over of the microcode cache.
    
       This can be done at the call site now which makes it obvious.
    
    Rework the code so that only the BSP scans the builtin/initrd microcode
    once during early boot and save it away in an early initcall for later
    use.
    
      [ bp: Test and fold in a fix from tglx ontop which handles the need to
        distinguish what save_microcode() does depending on when it is
        called:
    
         - when on the BSP during early load, it needs to find a newer
           revision than the one currently loaded on the BSP
    
         - later, before SMP init, it still runs on the BSP and gets the BSP
           revision just loaded and uses that revision to know which patch
           to save for the APs. For that it needs to find the exact one as
           on the BSP.
       ]
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231017211722.629085215@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Simplify scan_microcode() [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 13:59:39 2023 +0200

    x86/microcode/intel: Simplify scan_microcode()
    
    commit b0f0bf5eef5fac6ba30b7cac15ca4cb01f8a6ca9 upstream
    
    Make it readable and comprehensible.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115902.271940980@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Switch to kvmalloc() [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 13:59:45 2023 +0200

    x86/microcode/intel: Switch to kvmalloc()
    
    commit f24f204405f9875bc539c6e88553fd5ac913c867 upstream
    
    Microcode blobs are getting larger and might soon reach the kmalloc()
    limit. Switch over kvmalloc().
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115902.564323243@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode/intel: Unify microcode apply() functions [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Oct 17 23:23:44 2023 +0200

    x86/microcode/intel: Unify microcode apply() functions
    
    commit 3973718cff1e3a5d88ea78ec28ecca2afa60b30b upstream
    
    Deduplicate the early and late apply() functions.
    
      [ bp: Rename the function which does the actual application to
          __apply_microcode() to differentiate it from
          microcode_ops.apply_microcode(). ]
    
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20231017211722.795508212@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Add per CPU control field [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 14:00:01 2023 +0200

    x86/microcode: Add per CPU control field
    
    commit ba3aeb97cb2c53025356f31c5a0a294385194115 upstream
    
    Add a per CPU control field to ucode_ctrl and define constants for it
    which are going to be used to control the loading state machine.
    
    In theory this could be a global control field, but a global control does
    not cover the following case:
    
     15 primary CPUs load microcode successfully
      1 primary CPU fails and returns with an error code
    
    With global control the sibling of the failed CPU would either try again or
    the whole operation would be aborted with the consequence that the 15
    siblings do not invoke the apply path and end up with inconsistent software
    state. The result in dmesg would be inconsistent too.
    
    There are two additional fields added and initialized:
    
    ctrl_cpu and secondaries. ctrl_cpu is the CPU number of the primary thread
    for now, but with the upcoming uniform loading at package or system scope
    this will be one CPU per package or just one CPU. Secondaries hands the
    control CPU a CPU mask which will be required to release the secondary CPUs
    out of the wait loop.
    
    Preparatory change for implementing a properly split control flow for
    primary and secondary CPUs.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115903.319959519@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Add per CPU result state [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Oct 17 23:24:05 2023 +0200

    x86/microcode: Add per CPU result state
    
    commit 4b753955e9151ad2f722137a7bcbafda756186b3 upstream
    
    The microcode rendezvous is purely acting on global state, which does
    not allow to analyze fails in a coherent way.
    
    Introduce per CPU state where the results are written into, which allows to
    analyze the return codes of the individual CPUs.
    
    Initialize the state when walking the cpu_present_mask in the online
    check to avoid another for_each_cpu() loop.
    
    Enhance the result print out with that.
    
    The structure is intentionally named ucode_ctrl as it will gain control
    fields in subsequent changes.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231017211723.632681010@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Clarify the late load logic [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 13:59:57 2023 +0200

    x86/microcode: Clarify the late load logic
    
    commit 6f059e634dcd0d725854514c94c114bbdd83950d upstream
    
    reload_store() is way too complicated. Split the inner workings out and
    make the following enhancements:
    
     - Taint the kernel only when the microcode was actually updated. If. e.g.
       the rendezvous fails, then nothing happened and there is no reason for
       tainting.
    
     - Return useful error codes
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
    Link: https://lore.kernel.org/r/20231002115903.145048840@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Clean up mc_cpu_down_prep() [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 13:59:55 2023 +0200

    x86/microcode: Clean up mc_cpu_down_prep()
    
    commit ba48aa32388ac652256baa8d0a6092d350160da0 upstream
    
    This function has nothing to do with suspend. It's a hotplug
    callback. Remove the bogus comment.
    
    Drop the pointless debug printk. The hotplug core provides tracepoints
    which track the invocation of those callbacks.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115903.028651784@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Get rid of the schedule work indirection [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Oct 17 23:23:58 2023 +0200

    x86/microcode: Get rid of the schedule work indirection
    
    commit 2e1997335ceb6fc819862804f51d4fe83593c138 upstream
    
    Scheduling work on all CPUs to collect the microcode information is just
    another extra step for no value. Let the CPU hotplug callback registration
    do it.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231017211723.354748138@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Handle "nosmt" correctly [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 13:59:56 2023 +0200

    x86/microcode: Handle "nosmt" correctly
    
    commit 634ac23ad609b3ddd9e0e478bd5afbf49d3a2556 upstream
    
    On CPUs where microcode loading is not NMI-safe the SMT siblings which
    are parked in one of the play_dead() variants still react to NMIs.
    
    So if an NMI hits while the primary thread updates the microcode the
    resulting behaviour is undefined. The default play_dead() implementation on
    modern CPUs is using MWAIT which is not guaranteed to be safe against
    a microcode update which affects MWAIT.
    
    Take the cpus_booted_once_mask into account to detect this case and
    refuse to load late if the vendor specific driver does not advertise
    that late loading is NMI safe.
    
    AMD stated that this is safe, so mark the AMD driver accordingly.
    
    This requirement will be partially lifted in later changes.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115903.087472735@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Handle "offline" CPUs correctly [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 14:00:08 2023 +0200

    x86/microcode: Handle "offline" CPUs correctly
    
    commit 8f849ff63bcbc77670da03cb8f2b78b06257f455 upstream
    
    Offline CPUs need to be parked in a safe loop when microcode update is
    in progress on the primary CPU. Currently, offline CPUs are parked in
    mwait_play_dead(), and for Intel CPUs, its not a safe instruction,
    because the MWAIT instruction can be patched in the new microcode update
    that can cause instability.
    
      - Add a new microcode state 'UCODE_OFFLINE' to report status on per-CPU
      basis.
      - Force NMI on the offline CPUs.
    
    Wake up offline CPUs while the update is in progress and then return
    them back to mwait_play_dead() after microcode update is complete.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115903.660850472@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Mop up early loading leftovers [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Oct 17 23:23:56 2023 +0200

    x86/microcode: Mop up early loading leftovers
    
    commit 8529e8ab6c6fab8ebf06ead98e77d7646b42fc48 upstream
    
    Get rid of the initrd_gone hack which was required to keep
    find_microcode_in_initrd() functional after init.
    
    As find_microcode_in_initrd() is now only used during init, mark it
    accordingly.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231017211723.298854846@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Prepare for minimal revision check [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Oct 17 23:24:16 2023 +0200

    x86/microcode: Prepare for minimal revision check
    
    commit 9407bda845dd19756e276d4f3abc15a20777ba45 upstream
    
    Applying microcode late can be fatal for the running kernel when the
    update changes functionality which is in use already in a non-compatible
    way, e.g. by removing a CPUID bit.
    
    There is no way for admins which do not have access to the vendors deep
    technical support to decide whether late loading of such a microcode is
    safe or not.
    
    Intel has added a new field to the microcode header which tells the
    minimal microcode revision which is required to be active in the CPU in
    order to be safe.
    
    Provide infrastructure for handling this in the core code and a command
    line switch which allows to enforce it.
    
    If the update is considered safe the kernel is not tainted and the annoying
    warning message not emitted. If it's enforced and the currently loaded
    microcode revision is not safe for late loading then the load is aborted.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231017211724.079611170@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Protect against instrumentation [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 14:00:06 2023 +0200

    x86/microcode: Protect against instrumentation
    
    commit 1582c0f4a21303792f523fe2839dd8433ee630c0 upstream
    
    The wait for control loop in which the siblings are waiting for the
    microcode update on the primary thread must be protected against
    instrumentation as instrumentation can end up in #INT3, #DB or #PF,
    which then returns with IRET. That IRET reenables NMI which is the
    opposite of what the NMI rendezvous is trying to achieve.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115903.545969323@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Provide new control functions [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 14:00:02 2023 +0200

    x86/microcode: Provide new control functions
    
    commit 6067788f04b1020b316344fe34746f96d594a042 upstream
    
    The current all in one code is unreadable and really not suited for
    adding future features like uniform loading with package or system
    scope.
    
    Provide a set of new control functions which split the handling of the
    primary and secondary CPUs. These will replace the current rendezvous
    all in one function in the next step. This is intentionally a separate
    change because diff makes an complete unreadable mess otherwise.
    
    So the flow separates the primary and the secondary CPUs into their own
    functions which use the control field in the per CPU ucode_ctrl struct.
    
       primary()                    secondary()
        wait_for_all()               wait_for_all()
        apply_ucode()                wait_for_release()
        release()                    apply_ucode()
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115903.377922731@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Remove pointless apply() invocation [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Tue Oct 17 23:23:49 2023 +0200

    x86/microcode: Remove pointless apply() invocation
    
    commit b48b26f992a3828b4ae274669f99ce68451d4904 upstream
    
    Microcode is applied on the APs during early bringup. There is no point
    in trying to apply the microcode again during the hotplug operations and
    neither at the point where the microcode device is initialized.
    
    Collect CPU info and microcode revision in setup_online_cpu() for now.
    This will move to the CPU hotplug callback later.
    
      [ bp: Leave the starting notifier for the following scenario:
    
        - boot, late load, suspend to disk, resume
    
        without the starting notifier, only the last core manages to update the
        microcode upon resume:
    
        # rdmsr -a 0x8b
        10000bf
        10000bf
        10000bf
        10000bf
        10000bf
        10000dc <----
    
        This is on an AMD F10h machine.
    
        For the future, one should check whether potential unification of
        the CPU init path could cover the resume path too so that this can
        be simplified even more.
    
      tglx: This is caused by the odd handling of APs which try to find the
      microcode blob in builtin or initrd instead of caching the microcode
      blob during early init before the APs are brought up. Will be cleaned
      up in a later step. ]
    
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/20231017211723.018821624@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Rendezvous and load in NMI [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 14:00:05 2023 +0200

    x86/microcode: Rendezvous and load in NMI
    
    commit 7eb314a22800457396f541c655697dabd71e44a7 upstream
    
    stop_machine() does not prevent the spin-waiting sibling from handling
    an NMI, which is obviously violating the whole concept of rendezvous.
    
    Implement a static branch right in the beginning of the NMI handler
    which is nopped out except when enabled by the late loading mechanism.
    
    The late loader enables the static branch before stop_machine() is
    invoked. Each CPU has an nmi_enable in its control structure which
    indicates whether the CPU should go into the update routine.
    
    This is required to bridge the gap between enabling the branch and
    actually being at the point where it is required to enter the loader
    wait loop.
    
    Each CPU which arrives in the stopper thread function sets that flag and
    issues a self NMI right after that. If the NMI function sees the flag
    clear, it returns. If it's set it clears the flag and enters the
    rendezvous.
    
    This is safe against a real NMI which hits in between setting the flag
    and sending the NMI to itself. The real NMI will be swallowed by the
    microcode update and the self NMI will then let stuff continue.
    Otherwise this would end up with a spurious NMI.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115903.489900814@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Replace the all-in-one rendevous handler [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 14:00:03 2023 +0200

    x86/microcode: Replace the all-in-one rendevous handler
    
    commit 0bf871651211b58c7b19f40b746b646d5311e2ec upstream
    
    with a new handler which just separates the control flow of primary and
    secondary CPUs.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115903.433704135@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Rework early revisions reporting [+ + +]

Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Wed Nov 15 22:02:12 2023 +0100

    x86/microcode: Rework early revisions reporting
    
    commit 080990aa3344123673f686cda2df0d1b0deee046 upstream
    
    The AMD side of the loader issues the microcode revision for each
    logical thread on the system, which can become really noisy on huge
    machines. And doing that doesn't make a whole lot of sense - the
    microcode revision is already in /proc/cpuinfo.
    
    So in case one is interested in the theoretical support of mixed silicon
    steppings on AMD, one can check there.
    
    What is also missing on the AMD side - something which people have
    requested before - is showing the microcode revision the CPU had
    *before* the early update.
    
    So abstract that up in the main code and have the BSP on each vendor
    provide those revision numbers.
    
    Then, dump them only once on driver init.
    
    On Intel, do not dump the patch date - it is not needed.
    
    Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Link: https://lore.kernel.org/r/CAHk-=wg=%2B8rceshMkB4VnKxmRccVLtBLPBawnewZuuqyx5U=3A@mail.gmail.com
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

x86/microcode: Sanitize __wait_for_cpus() [+ + +]

Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Mon Oct 2 13:59:59 2023 +0200

    x86/microcode: Sanitize __wait_for_cpus()
    
    commit 0772b9aa1a8f7322dce8588c231cff8b57298a53 upstream
    
    The code is too complicated for no reason:
    
     - The return value is pointless as this is a strict boolean.
    
     - It's way simpler to count down from num_online_cpus() and check for
       zero.
    
      - The timeout argument is pointless as this is always one second.
    
      - Touching the NMI watchdog every 100ns does not make any sense, neither
        does checking every 100ns. This is really not a hotpath operation.
    
    Preload the atomic counter with the number of online CPUs and simplify the
    whole timeout logic. Delay for one microsecond and touch the NMI watchdog
    once per millisecond.
    
    Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Link: https://lore.kernel.org/r/20231002115903.204251527@linutronix.de
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>