Page: 1, 2  Next

5.15.0-3.2-liquorix-amd64: machine reboots upon ifdown enp6s0 ?!
rooots
Status: Interested
Joined: 17 May 2020
Posts: 43
Reply Quote
Hi,

first of all - thanks for bringing 5.15 to us in lqx flavour!

Now unfortunately 5.15 introduced a very severe bug which appears to be related to networking - and which did not appear with 5.14. Now please don't consider this a rant - I'm just absolutely overwhelmed of what happens. Here are the symptoms:

1. NIC configured as DHCP
a. Upon trying to shutdown my machine, it is getting stuck at "...stop job for ifdown enp6s0..." after about 14s with the timeout and system just freezing. Hard reset is the only way out.
b. When I try ifdown enp6s0 manually in the terminal, I'm getting the error
:: Code ::

send_packet: Operation not permitted
dhclient.c:3010: Failed to send 300 byte long packet over fallback interface.

and the system freezes after a couple of seconds. Hard reset is the only way out.

2. NIC configured as static
a. Upon trying to shutdown my machine, I'm observing the same behaviour as with DHCP.
b. When I try ifdown enp6s0 manually in the terminal - and now it's getting really absurd - I'm getting the error
:: Code ::

/etc/resolvconf/update.d/libc: Warning: /etc/resolv.conf is not a symbolic link to /run/resolvconf/resolv.conf

AND - my machine just reboots! Just to make that clear - I did not issue any reboot or shutdown commands yet! :-)

Anyway, this also happens with the stock 5.15 kernel so it appears to be not purely a lqx issue. But - would it be possible to give us a pointer what could have caused this regression compared to 5.14 and where to best report it?

Here's my system - yeah, early adopters again ;-)

:: Code ::

$ inxi -bxz
System:    Kernel: 5.15.0-3.2-liquorix-amd64 x86_64 bits: 64 compiler: N/A Desktop: Xfce 4.16.3
           Distro: Ubuntu 20.04.3 LTS (Focal Fossa)
Machine:   Type: Desktop System: ASUS product: N/A v: N/A serial: <filter>
           Mobo: ASUSTeK model: ROG STRIX Z690-A GAMING WIFI D4 v: Rev 1.xx serial: <filter>
           UEFI [Legacy]: American Megatrends v: 0707 date: 11/10/2021
CPU:       8-Core: 12th Gen Intel Core i9-12900K type: MT MCP arch: N/A speed: 5600 MHz max: 3201 MHz
Graphics:  Device-1: NVIDIA vendor: Gigabyte driver: nvidia v: 470.86 bus ID: 01:00.0
           Display: x11 server: X.Org 1.20.11 driver: nvidia tty: N/A
           Message: Unable to show advanced data. Required tool glxinfo missing.
Network:   Device-1: Intel vendor: ASUSTeK driver: igc v: kernel port: 4000 bus ID: 06:00.0
Drives:    Local Storage: total: 355.84 GiB used: 1.20 TiB (344.0%)
Info:      Processes: 337 Uptime: 16m Memory: 31.17 GiB used: 1.53 GiB (4.9%) Init: systemd runlevel: 5 Compilers: gcc: 9.4.0
           Shell: bash v: 5.0.17 inxi: 3.0.38




Any help much appreciated!
r.
Back to top
rooots
Status: Interested
Joined: 17 May 2020
Posts: 43
Reply Quote
Here's an important follow-up:

The error messages upon issuing ifdown enp6s0 described above are the same for both scenarios with 5.14 - but no freezing or reboots occur! That is, the interface is just stopped and can be raised without any problems again.
Back to top
techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Posts: 4127
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
Are you sure this isn't a systemd regression? maybe triggered by a change in the kernel? The stop job is running thing is one of the hallmarks of a systemd mistake or logic error, it's one reason I stopped shutting down my system and started using suspend or hibernate instead, got sick of those stop job is running that run eternally. The crash during it however sounds like a kernel thing, that should be documented in journald somewhere assuming you have it set to log previous session as well as current boot session.
Back to top
rooots
Status: Interested
Joined: 17 May 2020
Posts: 43
Reply Quote
Thanks for your feedback. Here's what I found in the journal for one of the involuntary reboots:

:: Code ::

Nov 21 10:39:25 computer sudo[5606]:    user : TTY=pts/0 ; PWD=/home/user ; USER=root ; COMMAND=/usr/sbin/ifdown enp6s0
Nov 21 10:39:25 computer sudo[5606]: pam_unix(sudo:session): session opened for user root by (uid=0)
-- Reboot --
Nov 21 10:40:14 computer systemd-journald[478]: Journal started


Not really that much information. If I use verbose output, the first and second entries have priority 5 and 6, respectively, so nothing critical seems to be logged here. Any suggestions where to look further?
Back to top
techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Posts: 4127
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
You can determine if the kernel actually crashed by doing the RSEIUB sequence, once it appears the system is locked up.

That's Ctl + Alt + SysRq + (R S E I U B), aka, Reboot System Even If Utterly Borked

If it responds and shuts down, it means the kernel itself is still able to listen for input.

Yes, it's hard to type that...
Back to top
damentz
Status: Assistant
Joined: 09 Sep 2008
Posts: 1122
Reply Quote
Hmm, I wonder if ifup/ifdown is exposing a bug with 5.15. Have you tried using any other network configuration tool? I'm assuming most users of 5.15 at this point are using NetworkManager. Most likely the way it brings up and down interfaces avoids whichever sequence of events that are causing issues with your system.
Back to top
rooots
Status: Interested
Joined: 17 May 2020
Posts: 43
Reply Quote
Thanks to both of you for your feedback. I tried your suggestions and it appears to me like a pretty severe kernel bug. By the way - for me it is only ALT-SysReq-REISUB without the CTL.

I've tried both 5.15.0-3.2-liquorix-amd64 and 5.15.3-051503-generic, results are the same.

1. enp6s0 managed manually, DHCP
a. ifdown enp6s0 in terminal by whatever reason won't freeze the machine anymore but would reboot it after ~10s as reported yesterday for the tests with enp6s0 STATIC config.
b. shutdown: repeated ALT-SysReq-REISUB after "stop job for ifdown..." freezes does not reboot the machine, hard reset required.

2. enp6s0 managed by network-manager
a. after booting and logging in, the display will stay blank and only show a blinking cursor and then freeze after 5..10s. ALT-SysReq-REISUB does not reboot the machine, hard reset required. Therefore, I did not even get as far as trying to shutdown etc.

network-manager is currently working fine with 5.14.0-19.2-liquorix-amd64, so the root cause seems to be somewhere else. I have no issues staying with 5.14 for the time being, but would like to help fixing this regression.

Any further suggestions are much appreciated!

r.
Back to top
rooots
Status: Interested
Joined: 17 May 2020
Posts: 43
Reply Quote
Just tried with 5.15.4-051504-generic but no change.
Back to top
Mono
Status: Contributor
Joined: 21 Jun 2012
Posts: 115
Reply Quote
nvidiafb is crashing:

:: Code ::
Nov 22 08:28:50 ronin kernel: nvidiafb: Unable to detect display type...
Nov 22 08:28:50 ronin kernel: ...Using default of CRT
Nov 22 08:28:50 ronin kernel: nvidiafb: Unable to detect which CRTCNumber...
Nov 22 08:28:50 ronin kernel: ...Defaulting to CRTCNumber 0
Nov 22 08:28:50 ronin kernel: nvidiafb: Using CRT on CRTC 0
Nov 22 08:28:50 ronin systemd-udevd[415]: Worker [435] terminated by signal 11 (SEGV)
Nov 22 08:28:50 ronin kernel: divide error: 0000 [#1] PREEMPT SMP NOPTI
Nov 22 08:28:50 ronin kernel: CPU: 3 PID: 435 Comm: systemd-udevd Not tainted 5.15.0-3.2-liquorix-amd64 #1  liquorix 5.15-2.1~sid
Nov 22 08:28:50 ronin kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B350-F GAMING, BIOS 5603 07/28/2020
Nov 22 08:28:50 ronin kernel: RIP: 0010:nvGetClocks+0x1ad/0x2c0 [nvidiafb]
Nov 22 08:28:50 ronin kernel: Code: 0f 84 a4 00 00 00 3d 30 03 00 00 0f 84 99 00 00 00 8b 8a 04 05 00 00 0f b6 c5 31 d2 41 0f af c1 44 0f b6 c9 c1 e9 10 83 e>
Nov 22 08:28:50 ronin kernel: RSP: 0018:ffffc900010d7858 EFLAGS: 00010246
Nov 22 08:28:50 ronin kernel: RAX: 0000000000000000 RBX: ffff8882105a2418 RCX: 0000000000000000
Nov 22 08:28:50 ronin kernel: RDX: 0000000000000000 RSI: ffffc900010d7884 RDI: ffff8882105a2418
Nov 22 08:28:50 ronin kernel: RBP: ffff8882105a2510 R08: ffffc900010d7880 R09: 0000000000000000
Nov 22 08:28:50 ronin kernel: R10: 00000000002e18c8 R11: 0000000000000068 R12: 0000000000062570
Nov 22 08:28:50 ronin kernel: R13: 000000000000000e R14: 0000000000000010 R15: 0000000000000008
Nov 22 08:28:50 ronin kernel: FS:  00007f8d8d29b8c0(0000) GS:ffff8886168c0000(0000) knlGS:0000000000000000
Nov 22 08:28:50 ronin kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 22 08:28:50 ronin kernel: CR2: 0000564a517e91f8 CR3: 00000001e1774000 CR4: 00000000003506e0
Nov 22 08:28:50 ronin kernel: Call Trace:
Nov 22 08:28:50 ronin kernel:  <TASK>
Nov 22 08:28:50 ronin kernel:  NVCalcStateExt+0x1bb/0x910 [nvidiafb]
Nov 22 08:28:50 ronin kernel:  ? __slab_free+0x121/0x340
Nov 22 08:28:50 ronin kernel:  ? kmem_cache_alloc_trace+0x2a6/0x410
Nov 22 08:28:50 ronin kernel:  ? kmem_cache_alloc_trace+0x2a6/0x410
Nov 22 08:28:50 ronin kernel:  nvidiafb_set_par+0x46c/0x9f0 [nvidiafb]
Nov 22 08:28:50 ronin kernel:  fbcon_init+0x27c/0x530
Nov 22 08:28:50 ronin kernel:  visual_init+0xcc/0x130
Nov 22 08:28:50 ronin kernel:  do_bind_con_driver.isra.0+0x1c2/0x2d0
Nov 22 08:28:50 ronin kernel:  do_take_over_console+0x116/0x1a0
Nov 22 08:28:50 ronin kernel:  do_fbcon_takeover+0x5c/0xd0
Nov 22 08:28:50 ronin kernel:  register_framebuffer+0x1e4/0x310
Nov 22 08:28:50 ronin kernel:  nvidiafb_probe.cold+0x789/0x80a [nvidiafb]
Nov 22 08:28:50 ronin kernel:  local_pci_probe+0x45/0x90
Nov 22 08:28:50 ronin kernel:  ? pci_match_device+0xdf/0x140
Nov 22 08:28:50 ronin kernel:  pci_device_probe+0x100/0x1c0
Nov 22 08:28:50 ronin kernel:  really_probe.part.0+0xb8/0x2b0
Nov 22 08:28:50 ronin kernel:  __driver_probe_device+0x90/0x120
Nov 22 08:28:50 ronin kernel:  driver_probe_device+0x1e/0xe0
Nov 22 08:28:50 ronin kernel:  __driver_attach+0xaf/0x190
Nov 22 08:28:50 ronin kernel:  ? __device_attach_driver+0x100/0x100
Nov 22 08:28:50 ronin kernel:  bus_for_each_dev+0x7c/0xd0
Nov 22 08:28:50 ronin kernel:  bus_add_driver+0x11a/0x1c0
Nov 22 08:28:50 ronin kernel:  driver_register+0x8f/0xf0
Nov 22 08:28:50 ronin kernel:  ? nvidiafb_setcolreg+0x300/0x300 [nvidiafb]
Nov 22 08:28:50 ronin kernel:  do_one_initcall+0x9e/0x270
Nov 22 08:28:50 ronin kernel:  ? load_module+0x26c6/0x2a00
Nov 22 08:28:50 ronin kernel:  ? security_kernel_post_read_file+0x38/0x60
Nov 22 08:28:50 ronin kernel:  ? kmem_cache_alloc_trace+0x2a6/0x410
Nov 22 08:28:50 ronin kernel:  do_init_module+0x5c/0x270
Nov 22 08:28:50 ronin kernel:  __x64_sys_finit_module+0xac/0x100
Nov 22 08:28:50 ronin kernel:  do_syscall_64+0x3b/0x90
Nov 22 08:28:50 ronin kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Nov 22 08:28:50 ronin kernel: RIP: 0033:0x7f8d8d74c5e9
Nov 22 08:28:50 ronin kernel: Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0>
Nov 22 08:28:50 ronin kernel: RSP: 002b:00007ffef65e0f08 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
Nov 22 08:28:50 ronin kernel: RAX: ffffffffffffffda RBX: 0000564a51814030 RCX: 00007f8d8d74c5e9
Nov 22 08:28:50 ronin kernel: RDX: 0000000000000000 RSI: 00007f8d8d8f8eed RDI: 0000000000000010
Nov 22 08:28:50 ronin kernel: RBP: 0000000000020000 R08: 0000000000000000 R09: 0000564a5180b3f0
Nov 22 08:28:50 ronin kernel: R10: 0000000000000010 R11: 0000000000000246 R12: 00007f8d8d8f8eed
Nov 22 08:28:50 ronin kernel: R13: 0000000000000000 R14: 0000564a516ee330 R15: 0000564a51814030
Nov 22 08:28:50 ronin kernel:  </TASK>
Nov 22 08:28:50 ronin kernel: Modules linked in: snd_hda_codec_realtek intel_rapl_msr snd_hda_codec_generic intel_rapl_common ledtrig_audio snd_hda_codec_hdm>
Nov 22 08:28:50 ronin kernel: ---[ end trace 381aa40e3aff9ffc ]---
Nov 22 08:28:50 ronin kernel: RIP: 0010:nvGetClocks+0x1ad/0x2c0 [nvidiafb]
Nov 22 08:28:50 ronin kernel: Code: 0f 84 a4 00 00 00 3d 30 03 00 00 0f 84 99 00 00 00 8b 8a 04 05 00 00 0f b6 c5 31 d2 41 0f af c1 44 0f b6 c9 c1 e9 10 83 e>
Nov 22 08:28:50 ronin kernel: RSP: 0018:ffffc900010d7858 EFLAGS: 00010246
Nov 22 08:28:50 ronin kernel: RAX: 0000000000000000 RBX: ffff8882105a2418 RCX: 0000000000000000
Nov 22 08:28:50 ronin kernel: RDX: 0000000000000000 RSI: ffffc900010d7884 RDI: ffff8882105a2418

Back to top
Mono
Status: Contributor
Joined: 21 Jun 2012
Posts: 115
Reply Quote
The last lines of the log are from me pressing ctrl-alt-del more than 7 times. It tries to reboot, but I guess it fails for some reason. I have also been getting a crash dump on my screen just before shutting down, but that has been happening longer than this bug.
Back to top
Display posts from previous:   
Page: 1, 2  Next
All times are GMT - 8 Hours