5.15.0-3.2-liquorix-amd64: machine reboots upon ifdown enp6s0 ?!
Hi,
first of all - thanks for bringing 5.15 to us in lqx flavour! Now unfortunately 5.15 introduced a very severe bug which appears to be related to networking - and which did not appear with 5.14. Now please don't consider this a rant - I'm just absolutely overwhelmed of what happens. Here are the symptoms: 1. NIC configured as DHCP a. Upon trying to shutdown my machine, it is getting stuck at "...stop job for ifdown enp6s0..." after about 14s with the timeout and system just freezing. Hard reset is the only way out. b. When I try ifdown enp6s0 manually in the terminal, I'm getting the error :: Code ::
send_packet: Operation not permitted dhclient.c:3010: Failed to send 300 byte long packet over fallback interface. and the system freezes after a couple of seconds. Hard reset is the only way out. 2. NIC configured as static a. Upon trying to shutdown my machine, I'm observing the same behaviour as with DHCP. b. When I try ifdown enp6s0 manually in the terminal - and now it's getting really absurd - I'm getting the error :: Code ::
/etc/resolvconf/update.d/libc: Warning: /etc/resolv.conf is not a symbolic link to /run/resolvconf/resolv.conf AND - my machine just reboots! Just to make that clear - I did not issue any reboot or shutdown commands yet! :-) Anyway, this also happens with the stock 5.15 kernel so it appears to be not purely a lqx issue. But - would it be possible to give us a pointer what could have caused this regression compared to 5.14 and where to best report it? Here's my system - yeah, early adopters again ;-) :: Code ::
$ inxi -bxz System: Kernel: 5.15.0-3.2-liquorix-amd64 x86_64 bits: 64 compiler: N/A Desktop: Xfce 4.16.3 Distro: Ubuntu 20.04.3 LTS (Focal Fossa) Machine: Type: Desktop System: ASUS product: N/A v: N/A serial: <filter> Mobo: ASUSTeK model: ROG STRIX Z690-A GAMING WIFI D4 v: Rev 1.xx serial: <filter> UEFI [Legacy]: American Megatrends v: 0707 date: 11/10/2021 CPU: 8-Core: 12th Gen Intel Core i9-12900K type: MT MCP arch: N/A speed: 5600 MHz max: 3201 MHz Graphics: Device-1: NVIDIA vendor: Gigabyte driver: nvidia v: 470.86 bus ID: 01:00.0 Display: x11 server: X.Org 1.20.11 driver: nvidia tty: N/A Message: Unable to show advanced data. Required tool glxinfo missing. Network: Device-1: Intel vendor: ASUSTeK driver: igc v: kernel port: 4000 bus ID: 06:00.0 Drives: Local Storage: total: 355.84 GiB used: 1.20 TiB (344.0%) Info: Processes: 337 Uptime: 16m Memory: 31.17 GiB used: 1.53 GiB (4.9%) Init: systemd runlevel: 5 Compilers: gcc: 9.4.0 Shell: bash v: 5.0.17 inxi: 3.0.38 Any help much appreciated! r. Back to top |
|||||
Here's an important follow-up:
The error messages upon issuing ifdown enp6s0 described above are the same for both scenarios with 5.14 - but no freezing or reboots occur! That is, the interface is just stopped and can be raised without any problems again. Back to top |
|||||
Are you sure this isn't a systemd regression? maybe triggered by a change in the kernel? The stop job is running thing is one of the hallmarks of a systemd mistake or logic error, it's one reason I stopped shutting down my system and started using suspend or hibernate instead, got sick of those stop job is running that run eternally. The crash during it however sounds like a kernel thing, that should be documented in journald somewhere assuming you have it set to log previous session as well as current boot session.
Back to top |
|||||
Thanks for your feedback. Here's what I found in the journal for one of the involuntary reboots:
:: Code ::
Nov 21 10:39:25 computer sudo[5606]: user : TTY=pts/0 ; PWD=/home/user ; USER=root ; COMMAND=/usr/sbin/ifdown enp6s0 Nov 21 10:39:25 computer sudo[5606]: pam_unix(sudo:session): session opened for user root by (uid=0) -- Reboot -- Nov 21 10:40:14 computer systemd-journald[478]: Journal started Not really that much information. If I use verbose output, the first and second entries have priority 5 and 6, respectively, so nothing critical seems to be logged here. Any suggestions where to look further? Back to top |
|||||
You can determine if the kernel actually crashed by doing the RSEIUB sequence, once it appears the system is locked up.
That's Ctl + Alt + SysRq + (R S E I U B), aka, Reboot System Even If Utterly Borked If it responds and shuts down, it means the kernel itself is still able to listen for input. Yes, it's hard to type that... Back to top |
|||||
Hmm, I wonder if ifup/ifdown is exposing a bug with 5.15. Have you tried using any other network configuration tool? I'm assuming most users of 5.15 at this point are using NetworkManager. Most likely the way it brings up and down interfaces avoids whichever sequence of events that are causing issues with your system.
Back to top |
|||||
Thanks to both of you for your feedback. I tried your suggestions and it appears to me like a pretty severe kernel bug. By the way - for me it is only ALT-SysReq-REISUB without the CTL.
I've tried both 5.15.0-3.2-liquorix-amd64 and 5.15.3-051503-generic, results are the same. 1. enp6s0 managed manually, DHCP a. ifdown enp6s0 in terminal by whatever reason won't freeze the machine anymore but would reboot it after ~10s as reported yesterday for the tests with enp6s0 STATIC config. b. shutdown: repeated ALT-SysReq-REISUB after "stop job for ifdown..." freezes does not reboot the machine, hard reset required. 2. enp6s0 managed by network-manager a. after booting and logging in, the display will stay blank and only show a blinking cursor and then freeze after 5..10s. ALT-SysReq-REISUB does not reboot the machine, hard reset required. Therefore, I did not even get as far as trying to shutdown etc. network-manager is currently working fine with 5.14.0-19.2-liquorix-amd64, so the root cause seems to be somewhere else. I have no issues staying with 5.14 for the time being, but would like to help fixing this regression. Any further suggestions are much appreciated! r. Back to top |
|||||
Just tried with 5.15.4-051504-generic but no change.
Back to top |
|||||
nvidiafb is crashing:
:: Code :: Nov 22 08:28:50 ronin kernel: nvidiafb: Unable to detect display type...
Nov 22 08:28:50 ronin kernel: ...Using default of CRT Nov 22 08:28:50 ronin kernel: nvidiafb: Unable to detect which CRTCNumber... Nov 22 08:28:50 ronin kernel: ...Defaulting to CRTCNumber 0 Nov 22 08:28:50 ronin kernel: nvidiafb: Using CRT on CRTC 0 Nov 22 08:28:50 ronin systemd-udevd[415]: Worker [435] terminated by signal 11 (SEGV) Nov 22 08:28:50 ronin kernel: divide error: 0000 [#1] PREEMPT SMP NOPTI Nov 22 08:28:50 ronin kernel: CPU: 3 PID: 435 Comm: systemd-udevd Not tainted 5.15.0-3.2-liquorix-amd64 #1 liquorix 5.15-2.1~sid Nov 22 08:28:50 ronin kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B350-F GAMING, BIOS 5603 07/28/2020 Nov 22 08:28:50 ronin kernel: RIP: 0010:nvGetClocks+0x1ad/0x2c0 [nvidiafb] Nov 22 08:28:50 ronin kernel: Code: 0f 84 a4 00 00 00 3d 30 03 00 00 0f 84 99 00 00 00 8b 8a 04 05 00 00 0f b6 c5 31 d2 41 0f af c1 44 0f b6 c9 c1 e9 10 83 e> Nov 22 08:28:50 ronin kernel: RSP: 0018:ffffc900010d7858 EFLAGS: 00010246 Nov 22 08:28:50 ronin kernel: RAX: 0000000000000000 RBX: ffff8882105a2418 RCX: 0000000000000000 Nov 22 08:28:50 ronin kernel: RDX: 0000000000000000 RSI: ffffc900010d7884 RDI: ffff8882105a2418 Nov 22 08:28:50 ronin kernel: RBP: ffff8882105a2510 R08: ffffc900010d7880 R09: 0000000000000000 Nov 22 08:28:50 ronin kernel: R10: 00000000002e18c8 R11: 0000000000000068 R12: 0000000000062570 Nov 22 08:28:50 ronin kernel: R13: 000000000000000e R14: 0000000000000010 R15: 0000000000000008 Nov 22 08:28:50 ronin kernel: FS: 00007f8d8d29b8c0(0000) GS:ffff8886168c0000(0000) knlGS:0000000000000000 Nov 22 08:28:50 ronin kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Nov 22 08:28:50 ronin kernel: CR2: 0000564a517e91f8 CR3: 00000001e1774000 CR4: 00000000003506e0 Nov 22 08:28:50 ronin kernel: Call Trace: Nov 22 08:28:50 ronin kernel: <TASK> Nov 22 08:28:50 ronin kernel: NVCalcStateExt+0x1bb/0x910 [nvidiafb] Nov 22 08:28:50 ronin kernel: ? __slab_free+0x121/0x340 Nov 22 08:28:50 ronin kernel: ? kmem_cache_alloc_trace+0x2a6/0x410 Nov 22 08:28:50 ronin kernel: ? kmem_cache_alloc_trace+0x2a6/0x410 Nov 22 08:28:50 ronin kernel: nvidiafb_set_par+0x46c/0x9f0 [nvidiafb] Nov 22 08:28:50 ronin kernel: fbcon_init+0x27c/0x530 Nov 22 08:28:50 ronin kernel: visual_init+0xcc/0x130 Nov 22 08:28:50 ronin kernel: do_bind_con_driver.isra.0+0x1c2/0x2d0 Nov 22 08:28:50 ronin kernel: do_take_over_console+0x116/0x1a0 Nov 22 08:28:50 ronin kernel: do_fbcon_takeover+0x5c/0xd0 Nov 22 08:28:50 ronin kernel: register_framebuffer+0x1e4/0x310 Nov 22 08:28:50 ronin kernel: nvidiafb_probe.cold+0x789/0x80a [nvidiafb] Nov 22 08:28:50 ronin kernel: local_pci_probe+0x45/0x90 Nov 22 08:28:50 ronin kernel: ? pci_match_device+0xdf/0x140 Nov 22 08:28:50 ronin kernel: pci_device_probe+0x100/0x1c0 Nov 22 08:28:50 ronin kernel: really_probe.part.0+0xb8/0x2b0 Nov 22 08:28:50 ronin kernel: __driver_probe_device+0x90/0x120 Nov 22 08:28:50 ronin kernel: driver_probe_device+0x1e/0xe0 Nov 22 08:28:50 ronin kernel: __driver_attach+0xaf/0x190 Nov 22 08:28:50 ronin kernel: ? __device_attach_driver+0x100/0x100 Nov 22 08:28:50 ronin kernel: bus_for_each_dev+0x7c/0xd0 Nov 22 08:28:50 ronin kernel: bus_add_driver+0x11a/0x1c0 Nov 22 08:28:50 ronin kernel: driver_register+0x8f/0xf0 Nov 22 08:28:50 ronin kernel: ? nvidiafb_setcolreg+0x300/0x300 [nvidiafb] Nov 22 08:28:50 ronin kernel: do_one_initcall+0x9e/0x270 Nov 22 08:28:50 ronin kernel: ? load_module+0x26c6/0x2a00 Nov 22 08:28:50 ronin kernel: ? security_kernel_post_read_file+0x38/0x60 Nov 22 08:28:50 ronin kernel: ? kmem_cache_alloc_trace+0x2a6/0x410 Nov 22 08:28:50 ronin kernel: do_init_module+0x5c/0x270 Nov 22 08:28:50 ronin kernel: __x64_sys_finit_module+0xac/0x100 Nov 22 08:28:50 ronin kernel: do_syscall_64+0x3b/0x90 Nov 22 08:28:50 ronin kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae Nov 22 08:28:50 ronin kernel: RIP: 0033:0x7f8d8d74c5e9 Nov 22 08:28:50 ronin kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0> Nov 22 08:28:50 ronin kernel: RSP: 002b:00007ffef65e0f08 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 Nov 22 08:28:50 ronin kernel: RAX: ffffffffffffffda RBX: 0000564a51814030 RCX: 00007f8d8d74c5e9 Nov 22 08:28:50 ronin kernel: RDX: 0000000000000000 RSI: 00007f8d8d8f8eed RDI: 0000000000000010 Nov 22 08:28:50 ronin kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000564a5180b3f0 Nov 22 08:28:50 ronin kernel: R10: 0000000000000010 R11: 0000000000000246 R12: 00007f8d8d8f8eed Nov 22 08:28:50 ronin kernel: R13: 0000000000000000 R14: 0000564a516ee330 R15: 0000564a51814030 Nov 22 08:28:50 ronin kernel: </TASK> Nov 22 08:28:50 ronin kernel: Modules linked in: snd_hda_codec_realtek intel_rapl_msr snd_hda_codec_generic intel_rapl_common ledtrig_audio snd_hda_codec_hdm> Nov 22 08:28:50 ronin kernel: ---[ end trace 381aa40e3aff9ffc ]--- Nov 22 08:28:50 ronin kernel: RIP: 0010:nvGetClocks+0x1ad/0x2c0 [nvidiafb] Nov 22 08:28:50 ronin kernel: Code: 0f 84 a4 00 00 00 3d 30 03 00 00 0f 84 99 00 00 00 8b 8a 04 05 00 00 0f b6 c5 31 d2 41 0f af c1 44 0f b6 c9 c1 e9 10 83 e> Nov 22 08:28:50 ronin kernel: RSP: 0018:ffffc900010d7858 EFLAGS: 00010246 Nov 22 08:28:50 ronin kernel: RAX: 0000000000000000 RBX: ffff8882105a2418 RCX: 0000000000000000 Nov 22 08:28:50 ronin kernel: RDX: 0000000000000000 RSI: ffffc900010d7884 RDI: ffff8882105a2418 Back to top |
|||||
The last lines of the log are from me pressing ctrl-alt-del more than 7 times. It tries to reboot, but I guess it fails for some reason. I have also been getting a crash dump on my screen just before shutting down, but that has been happening longer than this bug.
Back to top |
|||||
All times are GMT - 8 Hours
|