U.S. flag   An official website of the United States government
Dot gov

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Https

Secure .gov websites use HTTPS
A lock (Dot gov) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Search Results (Refine Search)

Search Parameters:
There are 234,970 matching records.
Displaying matches 1,701 through 1,720.
Vuln ID Summary CVSS Severity
CVE-2024-27009

In the Linux kernel, the following vulnerability has been resolved: s390/cio: fix race condition during online processing A race condition exists in ccw_device_set_online() that can cause the online process to fail, leaving the affected device in an inconsistent state. As a result, subsequent attempts to set that device online fail with return code ENODEV. The problem occurs when a path verification request arrives after a wait for final device state completed, but before the result state is evaluated. Fix this by ensuring that the CCW-device lock is held between determining final state and checking result state. Note that since: commit 2297791c92d0 ("s390/cio: dont unregister subchannel from child-drivers") path verification requests are much more likely to occur during boot, resulting in an increased chance of this race condition occurring.

Published: May 01, 2024; 2:15:19 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-27008

In the Linux kernel, the following vulnerability has been resolved: drm: nv04: Fix out of bounds access When Output Resource (dcb->or) value is assigned in fabricate_dcb_output(), there may be out of bounds access to dac_users array in case dcb->or is zero because ffs(dcb->or) is used as index there. The 'or' argument of fabricate_dcb_output() must be interpreted as a number of bit to set, not value. Utilize macros from 'enum nouveau_or' in calls instead of hardcoding. Found by Linux Verification Center (linuxtesting.org) with SVACE.

Published: May 01, 2024; 2:15:19 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-27007

In the Linux kernel, the following vulnerability has been resolved: userfaultfd: change src_folio after ensuring it's unpinned in UFFDIO_MOVE Commit d7a08838ab74 ("mm: userfaultfd: fix unexpected change to src_folio when UFFDIO_MOVE fails") moved the src_folio->{mapping, index} changing to after clearing the page-table and ensuring that it's not pinned. This avoids failure of swapout+migration and possibly memory corruption. However, the commit missed fixing it in the huge-page case.

Published: May 01, 2024; 2:15:19 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-27006

In the Linux kernel, the following vulnerability has been resolved: thermal/debugfs: Add missing count increment to thermal_debug_tz_trip_up() The count field in struct trip_stats, representing the number of times the zone temperature was above the trip point, needs to be incremented in thermal_debug_tz_trip_up(), for two reasons. First, if a trip point is crossed on the way up for the first time, thermal_debug_update_temp() called from update_temperature() does not see it because it has not been added to trips_crossed[] array in the thermal zone's struct tz_debugfs object yet. Therefore, when thermal_debug_tz_trip_up() is called after that, the trip point's count value is 0, and the attempt to divide by it during the average temperature computation leads to a divide error which causes the kernel to crash. Setting the count to 1 before the division by incrementing it fixes this problem. Second, if a trip point is crossed on the way up, but it has been crossed on the way up already before, its count value needs to be incremented to make a record of the fact that the zone temperature is above the trip now. Without doing that, if the mitigations applied after crossing the trip cause the zone temperature to drop below its threshold, the count will not be updated for this episode at all and the average temperature in the trip statistics record will be somewhat higher than it should be. Cc :6.8+ <stable@vger.kernel.org> # 6.8+

Published: May 01, 2024; 2:15:19 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-27005

In the Linux kernel, the following vulnerability has been resolved: interconnect: Don't access req_list while it's being manipulated The icc_lock mutex was split into separate icc_lock and icc_bw_lock mutexes in [1] to avoid lockdep splats. However, this didn't adequately protect access to icc_node::req_list. The icc_set_bw() function will eventually iterate over req_list while only holding icc_bw_lock, but req_list can be modified while only holding icc_lock. This causes races between icc_set_bw(), of_icc_get(), and icc_put(). Example A: CPU0 CPU1 ---- ---- icc_set_bw(path_a) mutex_lock(&icc_bw_lock); icc_put(path_b) mutex_lock(&icc_lock); aggregate_requests() hlist_for_each_entry(r, ... hlist_del(... <r = invalid pointer> Example B: CPU0 CPU1 ---- ---- icc_set_bw(path_a) mutex_lock(&icc_bw_lock); path_b = of_icc_get() of_icc_get_by_index() mutex_lock(&icc_lock); path_find() path_init() aggregate_requests() hlist_for_each_entry(r, ... hlist_add_head(... <r = invalid pointer> Fix this by ensuring icc_bw_lock is always held before manipulating icc_node::req_list. The additional places icc_bw_lock is held don't perform any memory allocations, so we should still be safe from the original lockdep splats that motivated the separate locks. [1] commit af42269c3523 ("interconnect: Fix locking for runpm vs reclaim")

Published: May 01, 2024; 2:15:18 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-27004

In the Linux kernel, the following vulnerability has been resolved: clk: Get runtime PM before walking tree during disable_unused Doug reported [1] the following hung task: INFO: task swapper/0:1 blocked for more than 122 seconds. Not tainted 5.15.149-21875-gf795ebc40eb8 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:swapper/0 state:D stack: 0 pid: 1 ppid: 0 flags:0x00000008 Call trace: __switch_to+0xf4/0x1f4 __schedule+0x418/0xb80 schedule+0x5c/0x10c rpm_resume+0xe0/0x52c rpm_resume+0x178/0x52c __pm_runtime_resume+0x58/0x98 clk_pm_runtime_get+0x30/0xb0 clk_disable_unused_subtree+0x58/0x208 clk_disable_unused_subtree+0x38/0x208 clk_disable_unused_subtree+0x38/0x208 clk_disable_unused_subtree+0x38/0x208 clk_disable_unused_subtree+0x38/0x208 clk_disable_unused+0x4c/0xe4 do_one_initcall+0xcc/0x2d8 do_initcall_level+0xa4/0x148 do_initcalls+0x5c/0x9c do_basic_setup+0x24/0x30 kernel_init_freeable+0xec/0x164 kernel_init+0x28/0x120 ret_from_fork+0x10/0x20 INFO: task kworker/u16:0:9 blocked for more than 122 seconds. Not tainted 5.15.149-21875-gf795ebc40eb8 #1 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u16:0 state:D stack: 0 pid: 9 ppid: 2 flags:0x00000008 Workqueue: events_unbound deferred_probe_work_func Call trace: __switch_to+0xf4/0x1f4 __schedule+0x418/0xb80 schedule+0x5c/0x10c schedule_preempt_disabled+0x2c/0x48 __mutex_lock+0x238/0x488 __mutex_lock_slowpath+0x1c/0x28 mutex_lock+0x50/0x74 clk_prepare_lock+0x7c/0x9c clk_core_prepare_lock+0x20/0x44 clk_prepare+0x24/0x30 clk_bulk_prepare+0x40/0xb0 mdss_runtime_resume+0x54/0x1c8 pm_generic_runtime_resume+0x30/0x44 __genpd_runtime_resume+0x68/0x7c genpd_runtime_resume+0x108/0x1f4 __rpm_callback+0x84/0x144 rpm_callback+0x30/0x88 rpm_resume+0x1f4/0x52c rpm_resume+0x178/0x52c __pm_runtime_resume+0x58/0x98 __device_attach+0xe0/0x170 device_initial_probe+0x1c/0x28 bus_probe_device+0x3c/0x9c device_add+0x644/0x814 mipi_dsi_device_register_full+0xe4/0x170 devm_mipi_dsi_device_register_full+0x28/0x70 ti_sn_bridge_probe+0x1dc/0x2c0 auxiliary_bus_probe+0x4c/0x94 really_probe+0xcc/0x2c8 __driver_probe_device+0xa8/0x130 driver_probe_device+0x48/0x110 __device_attach_driver+0xa4/0xcc bus_for_each_drv+0x8c/0xd8 __device_attach+0xf8/0x170 device_initial_probe+0x1c/0x28 bus_probe_device+0x3c/0x9c deferred_probe_work_func+0x9c/0xd8 process_one_work+0x148/0x518 worker_thread+0x138/0x350 kthread+0x138/0x1e0 ret_from_fork+0x10/0x20 The first thread is walking the clk tree and calling clk_pm_runtime_get() to power on devices required to read the clk hardware via struct clk_ops::is_enabled(). This thread holds the clk prepare_lock, and is trying to runtime PM resume a device, when it finds that the device is in the process of resuming so the thread schedule()s away waiting for the device to finish resuming before continuing. The second thread is runtime PM resuming the same device, but the runtime resume callback is calling clk_prepare(), trying to grab the prepare_lock waiting on the first thread. This is a classic ABBA deadlock. To properly fix the deadlock, we must never runtime PM resume or suspend a device with the clk prepare_lock held. Actually doing that is near impossible today because the global prepare_lock would have to be dropped in the middle of the tree, the device runtime PM resumed/suspended, and then the prepare_lock grabbed again to ensure consistency of the clk tree topology. If anything changes with the clk tree in the meantime, we've lost and will need to start the operation all over again. Luckily, most of the time we're simply incrementing or decrementing the runtime PM count on an active device, so we don't have the chance to schedule away with the prepare_lock held. Let's fix this immediate problem that can be ---truncated---

Published: May 01, 2024; 2:15:18 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-27003

In the Linux kernel, the following vulnerability has been resolved: clk: Get runtime PM before walking tree for clk_summary Similar to the previous commit, we should make sure that all devices are runtime resumed before printing the clk_summary through debugfs. Failure to do so would result in a deadlock if the thread is resuming a device to print clk state and that device is also runtime resuming in another thread, e.g the screen is turning on and the display driver is starting up. We remove the calls to clk_pm_runtime_{get,put}() in this path because they're superfluous now that we know the devices are runtime resumed. This also squashes a bug where the return value of clk_pm_runtime_get() wasn't checked, leading to an RPM count underflow on error paths.

Published: May 01, 2024; 2:15:18 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-27002

In the Linux kernel, the following vulnerability has been resolved: clk: mediatek: Do a runtime PM get on controllers during probe mt8183-mfgcfg has a mutual dependency with genpd during the probing stage, which leads to a deadlock in the following call stack: CPU0: genpd_lock --> clk_prepare_lock genpd_power_off_work_fn() genpd_lock() generic_pm_domain::power_off() clk_unprepare() clk_prepare_lock() CPU1: clk_prepare_lock --> genpd_lock clk_register() __clk_core_init() clk_prepare_lock() clk_pm_runtime_get() genpd_lock() Do a runtime PM get at the probe function to make sure clk_register() won't acquire the genpd lock. Instead of only modifying mt8183-mfgcfg, do this on all mediatek clock controller probings because we don't believe this would cause any regression. Verified on MT8183 and MT8192 Chromebooks.

Published: May 01, 2024; 2:15:18 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-27001

In the Linux kernel, the following vulnerability has been resolved: comedi: vmk80xx: fix incomplete endpoint checking While vmk80xx does have endpoint checking implemented, some things can fall through the cracks. Depending on the hardware model, URBs can have either bulk or interrupt type, and current version of vmk80xx_find_usb_endpoints() function does not take that fully into account. While this warning does not seem to be too harmful, at the very least it will crash systems with 'panic_on_warn' set on them. Fix the issue found by Syzkaller [1] by somewhat simplifying the endpoint checking process with usb_find_common_endpoints() and ensuring that only expected endpoint types are present. This patch has not been tested on real hardware. [1] Syzkaller report: usb 1-1: BOGUS urb xfer, pipe 1 != type 3 WARNING: CPU: 0 PID: 781 at drivers/usb/core/urb.c:504 usb_submit_urb+0xc4e/0x18c0 drivers/usb/core/urb.c:503 ... Call Trace: <TASK> usb_start_wait_urb+0x113/0x520 drivers/usb/core/message.c:59 vmk80xx_reset_device drivers/comedi/drivers/vmk80xx.c:227 [inline] vmk80xx_auto_attach+0xa1c/0x1a40 drivers/comedi/drivers/vmk80xx.c:818 comedi_auto_config+0x238/0x380 drivers/comedi/drivers.c:1067 usb_probe_interface+0x5cd/0xb00 drivers/usb/core/driver.c:399 ... Similar issue also found by Syzkaller:

Published: May 01, 2024; 2:15:18 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-27000

In the Linux kernel, the following vulnerability has been resolved: serial: mxs-auart: add spinlock around changing cts state The uart_handle_cts_change() function in serial_core expects the caller to hold uport->lock. For example, I have seen the below kernel splat, when the Bluetooth driver is loaded on an i.MX28 board. [ 85.119255] ------------[ cut here ]------------ [ 85.124413] WARNING: CPU: 0 PID: 27 at /drivers/tty/serial/serial_core.c:3453 uart_handle_cts_change+0xb4/0xec [ 85.134694] Modules linked in: hci_uart bluetooth ecdh_generic ecc wlcore_sdio configfs [ 85.143314] CPU: 0 PID: 27 Comm: kworker/u3:0 Not tainted 6.6.3-00021-gd62a2f068f92 #1 [ 85.151396] Hardware name: Freescale MXS (Device Tree) [ 85.156679] Workqueue: hci0 hci_power_on [bluetooth] (...) [ 85.191765] uart_handle_cts_change from mxs_auart_irq_handle+0x380/0x3f4 [ 85.198787] mxs_auart_irq_handle from __handle_irq_event_percpu+0x88/0x210 (...)

Published: May 01, 2024; 2:15:18 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-26999

In the Linux kernel, the following vulnerability has been resolved: serial/pmac_zilog: Remove flawed mitigation for rx irq flood The mitigation was intended to stop the irq completely. That may be better than a hard lock-up but it turns out that you get a crash anyway if you're using pmac_zilog as a serial console: ttyPZ0: pmz: rx irq flood ! BUG: spinlock recursion on CPU#0, swapper/0 That's because the pr_err() call in pmz_receive_chars() results in pmz_console_write() attempting to lock a spinlock already locked in pmz_interrupt(). With CONFIG_DEBUG_SPINLOCK=y, this produces a fatal BUG splat. The spinlock in question is the one in struct uart_port. Even when it's not fatal, the serial port rx function ceases to work. Also, the iteration limit doesn't play nicely with QEMU, as can be seen in the bug report linked below. A web search for other reports of the error message "pmz: rx irq flood" didn't produce anything. So I don't think this code is needed any more. Remove it.

Published: May 01, 2024; 2:15:17 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-26998

In the Linux kernel, the following vulnerability has been resolved: serial: core: Clearing the circular buffer before NULLifying it The circular buffer is NULLified in uart_tty_port_shutdown() under the spin lock. However, the PM or other timer based callbacks may still trigger after this event without knowning that buffer pointer is not valid. Since the serial code is a bit inconsistent in checking the buffer state (some rely on the head-tail positions, some on the buffer pointer), it's better to have both aligned, i.e. buffer pointer to be NULL and head-tail possitions to be the same, meaning it's empty. This will prevent asynchronous calls to dereference NULL pointer as reported recently in 8250 case: BUG: kernel NULL pointer dereference, address: 00000cf5 Workqueue: pm pm_runtime_work EIP: serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) ... ? serial8250_tx_chars (drivers/tty/serial/8250/8250_port.c:1809) __start_tx (drivers/tty/serial/8250/8250_port.c:1551) serial8250_start_tx (drivers/tty/serial/8250/8250_port.c:1654) serial_port_runtime_suspend (include/linux/serial_core.h:667 drivers/tty/serial/serial_port.c:63) __rpm_callback (drivers/base/power/runtime.c:393) ? serial_port_remove (drivers/tty/serial/serial_port.c:50) rpm_suspend (drivers/base/power/runtime.c:447) The proposed change will prevent ->start_tx() to be called during suspend on shut down port.

Published: May 01, 2024; 2:15:17 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-26997

In the Linux kernel, the following vulnerability has been resolved: usb: dwc2: host: Fix dereference issue in DDMA completion flow. Fixed variable dereference issue in DDMA completion flow.

Published: May 01, 2024; 2:15:17 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-26996

In the Linux kernel, the following vulnerability has been resolved: usb: gadget: f_ncm: Fix UAF ncm object at re-bind after usb ep transport error When ncm function is working and then stop usb0 interface for link down, eth_stop() is called. At this piont, accidentally if usb transport error should happen in usb_ep_enable(), 'in_ep' and/or 'out_ep' may not be enabled. After that, ncm_disable() is called to disable for ncm unbind but gether_disconnect() is never called since 'in_ep' is not enabled. As the result, ncm object is released in ncm unbind but 'dev->port_usb' associated to 'ncm->port' is not NULL. And when ncm bind again to recover netdev, ncm object is reallocated but usb0 interface is already associated to previous released ncm object. Therefore, once usb0 interface is up and eth_start_xmit() is called, released ncm object is dereferrenced and it might cause use-after-free memory. [function unlink via configfs] usb0: eth_stop dev->port_usb=ffffff9b179c3200 --> error happens in usb_ep_enable(). NCM: ncm_disable: ncm=ffffff9b179c3200 --> no gether_disconnect() since ncm->port.in_ep->enabled is false. NCM: ncm_unbind: ncm unbind ncm=ffffff9b179c3200 NCM: ncm_free: ncm free ncm=ffffff9b179c3200 <-- released ncm [function link via configfs] NCM: ncm_alloc: ncm alloc ncm=ffffff9ac4f8a000 NCM: ncm_bind: ncm bind ncm=ffffff9ac4f8a000 NCM: ncm_set_alt: ncm=ffffff9ac4f8a000 alt=0 usb0: eth_open dev->port_usb=ffffff9b179c3200 <-- previous released ncm usb0: eth_start dev->port_usb=ffffff9b179c3200 <-- eth_start_xmit() --> dev->wrap() Unable to handle kernel paging request at virtual address dead00000000014f This patch addresses the issue by checking if 'ncm->netdev' is not NULL at ncm_disable() to call gether_disconnect() to deassociate 'dev->port_usb'. It's more reasonable to check 'ncm->netdev' to call gether_connect/disconnect rather than check 'ncm->port.in_ep->enabled' since it might not be enabled but the gether connection might be established.

Published: May 01, 2024; 2:15:17 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-26995

In the Linux kernel, the following vulnerability has been resolved: usb: typec: tcpm: Correct the PDO counting in pd_set Off-by-one errors happen because nr_snk_pdo and nr_src_pdo are incorrectly added one. The index of the loop is equal to the number of PDOs to be updated when leaving the loop and it doesn't need to be added one. When doing the power negotiation, TCPM relies on the "nr_snk_pdo" as the size of the local sink PDO array to match the Source capabilities of the partner port. If the off-by-one overflow occurs, a wrong RDO might be sent and unexpected power transfer might happen such as over voltage or over current (than expected). "nr_src_pdo" is used to set the Rp level when the port is in Source role. It is also the array size of the local Source capabilities when filling up the buffer which will be sent as the Source PDOs (such as in Power Negotiation). If the off-by-one overflow occurs, a wrong Rp level might be set and wrong Source PDOs will be sent to the partner port. This could potentially cause over current or port resets.

Published: May 01, 2024; 2:15:17 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-26994

In the Linux kernel, the following vulnerability has been resolved: speakup: Avoid crash on very long word In case a console is set up really large and contains a really long word (> 256 characters), we have to stop before the length of the word buffer.

Published: May 01, 2024; 2:15:17 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-26993

In the Linux kernel, the following vulnerability has been resolved: fs: sysfs: Fix reference leak in sysfs_break_active_protection() The sysfs_break_active_protection() routine has an obvious reference leak in its error path. If the call to kernfs_find_and_get() fails then kn will be NULL, so the companion sysfs_unbreak_active_protection() routine won't get called (and would only cause an access violation by trying to dereference kn->parent if it was called). As a result, the reference to kobj acquired at the start of the function will never be released. Fix the leak by adding an explicit kobject_put() call when kn is NULL.

Published: May 01, 2024; 2:15:17 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-26992

In the Linux kernel, the following vulnerability has been resolved: KVM: x86/pmu: Disable support for adaptive PEBS Drop support for virtualizing adaptive PEBS, as KVM's implementation is architecturally broken without an obvious/easy path forward, and because exposing adaptive PEBS can leak host LBRs to the guest, i.e. can leak host kernel addresses to the guest. Bug #1 is that KVM doesn't account for the upper 32 bits of IA32_FIXED_CTR_CTRL when (re)programming fixed counters, e.g fixed_ctrl_field() drops the upper bits, reprogram_fixed_counters() stores local variables as u8s and truncates the upper bits too, etc. Bug #2 is that, because KVM _always_ sets precise_ip to a non-zero value for PEBS events, perf will _always_ generate an adaptive record, even if the guest requested a basic record. Note, KVM will also enable adaptive PEBS in individual *counter*, even if adaptive PEBS isn't exposed to the guest, but this is benign as MSR_PEBS_DATA_CFG is guaranteed to be zero, i.e. the guest will only ever see Basic records. Bug #3 is in perf. intel_pmu_disable_fixed() doesn't clear the upper bits either, i.e. leaves ICL_FIXED_0_ADAPTIVE set, and intel_pmu_enable_fixed() effectively doesn't clear ICL_FIXED_0_ADAPTIVE either. I.e. perf _always_ enables ADAPTIVE counters, regardless of what KVM requests. Bug #4 is that adaptive PEBS *might* effectively bypass event filters set by the host, as "Updated Memory Access Info Group" records information that might be disallowed by userspace via KVM_SET_PMU_EVENT_FILTER. Bug #5 is that KVM doesn't ensure LBR MSRs hold guest values (or at least zeros) when entering a vCPU with adaptive PEBS, which allows the guest to read host LBRs, i.e. host RIPs/addresses, by enabling "LBR Entries" records. Disable adaptive PEBS support as an immediate fix due to the severity of the LBR leak in particular, and because fixing all of the bugs will be non-trivial, e.g. not suitable for backporting to stable kernels. Note! This will break live migration, but trying to make KVM play nice with live migration would be quite complicated, wouldn't be guaranteed to work (i.e. KVM might still kill/confuse the guest), and it's not clear that there are any publicly available VMMs that support adaptive PEBS, let alone live migrate VMs that support adaptive PEBS, e.g. QEMU doesn't support PEBS in any capacity.

Published: May 01, 2024; 2:15:16 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-26991

In the Linux kernel, the following vulnerability has been resolved: KVM: x86/mmu: x86: Don't overflow lpage_info when checking attributes Fix KVM_SET_MEMORY_ATTRIBUTES to not overflow lpage_info array and trigger KASAN splat, as seen in the private_mem_conversions_test selftest. When memory attributes are set on a GFN range, that range will have specific properties applied to the TDP. A huge page cannot be used when the attributes are inconsistent, so they are disabled for those the specific huge pages. For internal KVM reasons, huge pages are also not allowed to span adjacent memslots regardless of whether the backing memory could be mapped as huge. What GFNs support which huge page sizes is tracked by an array of arrays 'lpage_info' on the memslot, of ‘kvm_lpage_info’ structs. Each index of lpage_info contains a vmalloc allocated array of these for a specific supported page size. The kvm_lpage_info denotes whether a specific huge page (GFN and page size) on the memslot is supported. These arrays include indices for unaligned head and tail huge pages. Preventing huge pages from spanning adjacent memslot is covered by incrementing the count in head and tail kvm_lpage_info when the memslot is allocated, but disallowing huge pages for memory that has mixed attributes has to be done in a more complicated way. During the KVM_SET_MEMORY_ATTRIBUTES ioctl KVM updates lpage_info for each memslot in the range that has mismatched attributes. KVM does this a memslot at a time, and marks a special bit, KVM_LPAGE_MIXED_FLAG, in the kvm_lpage_info for any huge page. This bit is essentially a permanently elevated count. So huge pages will not be mapped for the GFN at that page size if the count is elevated in either case: a huge head or tail page unaligned to the memslot or if KVM_LPAGE_MIXED_FLAG is set because it has mixed attributes. To determine whether a huge page has consistent attributes, the KVM_SET_MEMORY_ATTRIBUTES operation checks an xarray to make sure it consistently has the incoming attribute. Since level - 1 huge pages are aligned to level huge pages, it employs an optimization. As long as the level - 1 huge pages are checked first, it can just check these and assume that if each level - 1 huge page contained within the level sized huge page is not mixed, then the level size huge page is not mixed. This optimization happens in the helper hugepage_has_attrs(). Unfortunately, although the kvm_lpage_info array representing page size 'level' will contain an entry for an unaligned tail page of size level, the array for level - 1 will not contain an entry for each GFN at page size level. The level - 1 array will only contain an index for any unaligned region covered by level - 1 huge page size, which can be a smaller region. So this causes the optimization to overflow the level - 1 kvm_lpage_info and perform a vmalloc out of bounds read. In some cases of head and tail pages where an overflow could happen, callers skip the operation completely as KVM_LPAGE_MIXED_FLAG is not required to prevent huge pages as discussed earlier. But for memslots that are smaller than the 1GB page size, it does call hugepage_has_attrs(). In this case the huge page is both the head and tail page. The issue can be observed simply by compiling the kernel with CONFIG_KASAN_VMALLOC and running the selftest “private_mem_conversions_test”, which produces the output like the following: BUG: KASAN: vmalloc-out-of-bounds in hugepage_has_attrs+0x7e/0x110 Read of size 4 at addr ffffc900000a3008 by task private_mem_con/169 Call Trace: dump_stack_lvl print_report ? __virt_addr_valid ? hugepage_has_attrs ? hugepage_has_attrs kasan_report ? hugepage_has_attrs hugepage_has_attrs kvm_arch_post_set_memory_attributes kvm_vm_ioctl It is a little ambiguous whether the unaligned head page (in the bug case also the tail page) should be expected to have KVM_LPAGE_MIXED_FLAG set. It is not functionally required, as the unal ---truncated---

Published: May 01, 2024; 2:15:16 AM -0400
V3.x:(not available)
V2.0:(not available)
CVE-2024-26990

In the Linux kernel, the following vulnerability has been resolved: KVM: x86/mmu: Write-protect L2 SPTEs in TDP MMU when clearing dirty status Check kvm_mmu_page_ad_need_write_protect() when deciding whether to write-protect or clear D-bits on TDP MMU SPTEs, so that the TDP MMU accounts for any role-specific reasons for disabling D-bit dirty logging. Specifically, TDP MMU SPTEs must be write-protected when the TDP MMU is being used to run an L2 (i.e. L1 has disabled EPT) and PML is enabled. KVM always disables PML when running L2, even when L1 and L2 GPAs are in the some domain, so failing to write-protect TDP MMU SPTEs will cause writes made by L2 to not be reflected in the dirty log. [sean: massage shortlog and changelog, tweak ternary op formatting]

Published: May 01, 2024; 2:15:16 AM -0400
V3.x:(not available)
V2.0:(not available)