Changed |
Description |
In the Linux kernel, the following vulnerability has been resolved:
smb: client: Fix netns refcount imbalance causing leaks and use-after-free
Commit ef7134c7fc48 ("smb: client: Fix use-after-free of network
namespace.") attempted to fix a netns use-after-free issue by manually
adjusting reference counts via sk->sk_net_refcnt and sock_inuse_add().
However, a later commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock
after rmmod") pointed out that the approach of manually setting
sk->sk_net_refcnt in the first commit was technically incorrect, as
sk->sk_net_refcnt should only be set for user sockets. It led to issues
like TCP timers not being cleared properly on close. The second commit
moved to a model of just holding an extra netns reference for
server->ssocket using get_net(), and dropping it when the server is torn
down.
But there remain some gaps in the get_net()/put_net() balancing added by
these commits. The incomplete reference handling in these fixes results
in two issues:
1. Netns refcount leaks[1]
The problem process is as follows:
```
mount.cifs cifsd
cifs_do_mount
cifs_mount
cifs_mount_get_session
cifs_get_tcp_session
get_net() /* First get net. */
ip_connect
generic_ip_connect /* Try port 445 */
get_net()
->connect() /* Failed */
put_net()
generic_ip_connect /* Try port 139 */
get_net() /* Missing matching put_net() for this get_net().*/
cifs_get_smb_ses
cifs_negotiate_protocol
smb2_negotiate
SMB2_negotiate
cifs_send_recv
wait_for_response
cifs_demultiplex_thread
cifs_read_from_socket
cifs_readv_from_socket
cifs_reconnect
cifs_abort_connection
sock_release();
server->ssocket = NULL;
/* Missing put_net() here. */
generic_ip_connect
get_net()
->connect() /* Failed */
put_net()
sock_release();
server->ssocket = NULL;
free_rsp_buf
...
clean_demultiplex_info
/* It's only called once here. */
put_net()
```
When cifs_reconnect() is triggered, the server->ssocket is released
without a corresponding put_net() for the reference acquired in
generic_ip_connect() before. it ends up calling generic_ip_connect()
again to retry get_net(). After that, server->ssocket is set to NULL
in the error path of generic_ip_connect(), and the net count cannot be
released in the final clean_demultiplex_info() function.
2. Potential use-after-free
The current refcounting scheme can lead to a potential use-after-free issue
in the following scenario:
```
cifs_do_mount
cifs_mount
cifs_mount_get_session
cifs_get_tcp_session
get_net() /* First get net */
ip_connect
generic_ip_connect
get_net()
bind_socket
kernel_bind /* failed */
put_net()
/* after out_err_crypto_release label */
put_net()
/* after out_err label */
put_net()
```
In the exception handling process where binding the socket fails, the
get_net() and put_net() calls are unbalanced, which may cause the
server->net reference count to drop to zero and be prematurely released.
To address both issues, this patch ties the netns reference counti
---truncated---
|
In the Linux kernel, the following vulnerability has been resolved:
Revert "smb: client: fix TCP timers deadlock after rmmod"
This reverts commit e9f2517a3e18a54a3943c098d2226b245d488801.
Commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after
rmmod") is intended to fix a null-ptr-deref in LOCKDEP, which is
mentioned as CVE-2024-54680, but is actually did not fix anything;
The issue can be reproduced on top of it. [0]
Also, it reverted the change by commit ef7134c7fc48 ("smb: client:
Fix use-after-free of network namespace.") and introduced a real
issue by reviving the kernel TCP socket.
When a reconnect happens for a CIFS connection, the socket state
transitions to FIN_WAIT_1. Then, inet_csk_clear_xmit_timers_sync()
in tcp_close() stops all timers for the socket.
If an incoming FIN packet is lost, the socket will stay at FIN_WAIT_1
forever, and such sockets could be leaked up to net.ipv4.tcp_max_orphans.
Usually, FIN can be retransmitted by the peer, but if the peer aborts
the connection, the issue comes into reality.
I warned about this privately by pointing out the exact report [1],
but the bogus fix was finally merged.
So, we should not stop the timers to finally kill the connection on
our side in that case, meaning we must not use a kernel socket for
TCP whose sk->sk_net_refcnt is 0.
The kernel socket does not have a reference to its netns to make it
possible to tear down netns without cleaning up every resource in it.
For example, tunnel devices use a UDP socket internally, but we can
destroy netns without removing such devices and let it complete
during exit. Otherwise, netns would be leaked when the last application
died.
However, this is problematic for TCP sockets because TCP has timers to
close the connection gracefully even after the socket is close()d. The
lifetime of the socket and its netns is different from the lifetime of
the underlying connection.
If the socket user does not maintain the netns lifetime, the timer could
be fired after the socket is close()d and its netns is freed up, resulting
in use-after-free.
Actually, we have seen so many similar issues and converted such sockets
to have a reference to netns.
That's why I converted the CIFS client socket to have a reference to
netns (sk->sk_net_refcnt == 1), which is somehow mentioned as out-of-scope
of CIFS and technically wrong in e9f2517a3e18, but **is in-scope and right
fix**.
Regarding the LOCKDEP issue, we can prevent the module unload by
bumping the module refcount when switching the LOCKDDEP key in
sock_lock_init_class_and_name(). [2]
For a while, let's revert the bogus fix.
Note that now we can use sk_net_refcnt_upgrade() for the socket
conversion, but I'll do so later separately to make backport easy.
|