[RESOLVED] Frequent temporary hangs with 20.04 and 5.7.0-19.1-liquorix-amd64
I had a largely stock Ubuntu 20.04 and installed Liquorix because I was after fsync support for gaming (which worked very well on Greedfall btw, I went from 10-15 FPS to over 60 - thanks!).
After installing I noticed frequent - at least every minute - and temporary hangs where all input methods became unresponsive for 2-15 seconds.
This happened regardless of what I was doing (scrolling through a page in Firefox, typing in slack, using a terminal). Keyboard input during each hang was buffered and output after it unfroze.
This is reproduceable for me with 5.7.0-19.1-liquorix-amd64, and it happens under X11 or Wayland. It goes away if I downgrade kernels back to 5.4.0-45-generic.
Watching top whilst it happens doesn't tell me anything because it wouldn't update during the time it was hung. Nothing interesting shows up in sys logs. I've tried disabling swap (not really used on this machine anyway) due to some similar ubuntu reports. I've run out of ideas.
Any tips on how to investigate further please?
:: Code ::
System: Host: brian-desktop Kernel: 5.4.0-45-generic x86_64 bits: 64 compiler: gcc v: 9.3.0 Desktop: Gnome 3.36.4
wm: gnome-shell dm: GDM3 Distro: Ubuntu 20.04.1 LTS (Focal Fossa)
Machine: Type: Desktop Mobo: Micro-Star model: MAG X570 TOMAHAWK WIFI (MS-7C84) v: 1.0 serial: <superuser/root required>
UEFI [Legacy]: American Megatrends v: 1.00 date: 04/11/2020
CPU: Topology: 12-Core model: AMD Ryzen 9 3900X bits: 64 type: MT MCP arch: Zen L2 cache: 6144 KiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 182394
Speed: 2199 MHz min/max: 2200/3800 MHz Core speeds (MHz): 1: 2200 2: 2199 3: 2195 4: 2195 5: 2199 6: 2200 7: 2200
8: 2200 9: 2199 10: 2194 11: 2199 12: 2199 13: 2199 14: 2191 15: 2191 16: 2193 17: 2199 18: 2199 19: 2200 20: 2200
21: 2199 22: 2200 23: 2200 24: 2199
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] vendor: ASUSTeK
driver: amdgpu v: kernel bus ID: 2d:00.0 chip ID: 1002:67df
Display: wayland server: X.Org 1.20.8 driver: amdgpu compositor: gnome-shell
resolution: 1920x1200~60Hz, 1920x1200~60Hz
OpenGL: renderer: Radeon RX 580 Series (POLARIS10 DRM 3.35.0 5.4.0-45-generic LLVM 10.0.1) v: 4.6 Mesa 20.2.0-rc3
direct render: Yes
Audio: Device-1: AMD Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] vendor: ASUSTeK driver: snd_hda_intel
v: kernel bus ID: 2d:00.1 chip ID: 1002:aaf0
Device-2: Advanced Micro Devices [AMD] Starship/Matisse HD Audio vendor: Micro-Star MSI driver: snd_hda_intel
v: kernel bus ID: 2f:00.4 chip ID: 1022:1487
Device-3: C-Media Antlion USB adapter type: USB driver: hid-generic,snd-usb-audio,usbhid bus ID: 3-2:2
chip ID: 0d8c:002b
Device-4: Logitech OrbiCam type: USB driver: snd-usb-audio,uvcvideo bus ID: 5-4:3 chip ID: 046d:0892
Sound Server: ALSA v: k5.4.0-45-generic
Network: Device-1: Realtek RTL8125 2.5GbE vendor: Micro-Star MSI driver: r8125 v: 9.003.05-NAPI port: f000 bus ID: 26:00.0
chip ID: 10ec:8125
IF: enp38s0 state: up speed: 1000 Mbps duplex: full mac: 2c:f0:5d:3c:a2:2f
IP v4: 192.168.50.252/24 type: dynamic noprefixroute scope: global broadcast: 192.168.50.255
IP v6: fe80::7ac:952b:4031:d588/64 type: noprefixroute scope: link
Device-2: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel port: f000 bus ID: 28:00.0 chip ID: 8086:2723
IF: wlo1 state: down mac: 14:f6:d8:12:89:e1
IF-ID-1: docker0 state: down mac: 02:42:17:e9:27:67
IP v4: 172.17.0.1/16 scope: global broadcast: 172.17.255.255
IF-ID-2: wgpia0 state: unknown speed: N/A duplex: N/A mac: N/A
IP v4: 10.6.185.133/32 scope: global
WAN IP: 126.96.36.199
Drives: Local Storage: total: 1.86 TiB used: 1.19 TiB (63.9%)
ID-1: /dev/nvme0n1 vendor: Intel model: SSDPEKNW020T8 size: 1.86 TiB speed: 31.6 Gb/s lanes: 4
Partition: ID-1: / size: 1.79 TiB used: 1.19 TiB (66.6%) fs: ext4 dev: /dev/dm-1
ID-2: /boot size: 703.5 MiB used: 213.9 MiB (30.4%) fs: ext4 dev: /dev/nvme0n1p5
ID-3: swap-1 size: 976.0 MiB used: 0 KiB (0.0%) fs: swap dev: /dev/dm-2
Sensors: System Temperatures: cpu: 48.6 C mobo: 33.0 C gpu: amdgpu temp: 36 C
Fan Speeds (RPM): fan-1: 998 fan-2: 0 fan-3: 966 fan-4: 907 fan-5: 1189 fan-6: 1231 fan-7: 0 gpu: amdgpu fan: 1280
Info: Processes: 462 Uptime: 3h 04m Memory: 62.83 GiB used: 3.33 GiB (5.3%) Init: systemd v: 245 runlevel: 5 Compilers:
gcc: 9.3.0 alt: 9 Shell: bash v: 5.0.17 running in: terminator inxi: 3.0.38
Back to top
Are you getting this behavior only on 5.7.0-19.1? Since development of the kernel quickly moves on to the next major version, a lot of stable patches make it back to the last stable kernel with new errata, but since the kernel goes EOL, these issues are left outstanding.
I'll be working on getting 5.8 going the end of this week and this weekend, so I don't have much else to say besides a big update is coming soon (also pending an official port of MuQSS from Con).
One thing you can try in the meantime, what happens if you boot with the parameter, rqshare=mc? Liquorix is configured to use mc-llc, which configures a runqueue per CCX on Ryzen, but maybe there's something buggy happening there.
Also, can you refresh your inxi output for when you're running Liquorix? Just want to make sure nothing obvious sticks out.
And finally, this output of your GPU looks odd, the output seems to indicate you're using LLVM which is more typical with software acceleration: OpenGL: renderer: Radeon RX 580 Series (POLARIS10 DRM 3.35.0 5.4.0-45-generic LLVM 10.0.1) v: 4.6 Mesa 20.2.0-rc3
Back to top
Hi and thanks for the reply.
Yes I've only tried 5.7.0-19.1, happy to give 5.8 a spin when it's ready.
I'll try rqshare=mc tomorrow and update with the results and inxi output.
You are right about the GPU output - I had forgotten I was running a mesa RC from ernstp/mesarc to see if newer driver versions helped with some game stability issues. I will try again with the 20.0.8 mesa that comes from focal.
Back to top
Also quick update, I released a 5.8 kernel on all avenues (Arch, Debian, Ubuntu). So far so good, try upgrading to that and see if it helps.
Back to top
Running the 5.8 kernel resolved this for me - thanks for the help.
Back to top
Great! I'll mark this thread as resolved.
Back to top
Hi, Sadly it does not seem resolved for me. I used liquorix 5.7 before and had issues similar as described here. I didn't have time to diagnose so I switched back to stock. When I hear that 5.8 was out I decided to give it a go. The hangs seem to be shorter and less frequent but they are still there. The good news is that using rqshare=mc everything goes butter smooth.
Here is my inxi -bxx output:
:: Code ::
System: Host: martin-X470-AORUS-ULTRA-GAMING Kernel: 5.8.0-8.1-liquorix-amd64 x86_64 bits: 64 compiler: N/A
Desktop: KDE Plasma 5.19.5 tk: Qt 5.15.0 wm: kwin_x11 dm: SDDM Distro: KDE neon 20.04 5.19
base: Ubuntu 20.04 LTS Focal
Machine: Type: Desktop System: Gigabyte product: X470 AORUS ULTRA GAMING v: N/A serial: <superuser/root required>
Mobo: Gigabyte model: X470 AORUS ULTRA GAMING-CF v: x.x serial: <superuser/root required> UEFI: American Megatrends
v: F51g date: 07/02/2020
CPU: 12-Core: AMD Ryzen 9 3900X type: MT MCP arch: Zen speed: 2200 MHz min/max: 2200/3800 MHz
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] vendor: Gigabyte driver: amdgpu
v: kernel bus ID: 0b:00.0 chip ID: 1002:687f
Display: x11 server: X.Org 1.20.8 driver: amdgpu compositor: kwin_x11 resolution: 1920x1080~60Hz
OpenGL: renderer: Radeon RX Vega (VEGA10 DRM 3.38.0 5.8.0-8.1-liquorix-amd64 LLVM 10.0.1)
v: 4.6 Mesa 20.1.7 - kisak-mesa PPA direct render: Yes
Network: Device-1: Intel I211 Gigabit Network vendor: Gigabyte driver: igb v: 5.6.0-k port: f000 bus ID: 06:00.0
chip ID: 8086:1539
Drives: Local Storage: total: 4.09 TiB used: 3.26 TiB (79.7%)
Info: Processes: 460 Uptime: 8m Memory: 15.64 GiB used: 2.90 GiB (18.5%) Init: systemd v: 245 runlevel: 5 Compilers:
gcc: 9.3.0 alt: 7/9 Shell: zsh v: 5.8 running in: yakuake inxi: 3.0.38
Once again ryzen 9 and a amd video card.
I recall I read somewhere that amgpu driver is aware of zen topology and pins itself to a particular ccx, maybe that is causing some scheduling conflict when using mc-llc?
Back to top
Thanks for finding the change that resolves this behavior. I pushed out a change setting MC runqueue sharing as default and will push out a new kernel shortly: github.com/damentz/liquorix-package/commit/bfc9cf6bfa25e4e4fa9261c63f59b9c325802236
Back to top
rechapita, before you go completely, could you post the MuQSS RQ output from dmesg/journalctl -k in full to this forum? It will be useful in the future for Con or someone else to diagnose what might be going on with these newer Zen2+ generation CPUs.
And the output you're looking for will roughly start and end like this:
:: Code ::MuQSS possible/present/online CPUs: 16/16/16
MuQSS locality CPU 0 to 0: 0
MuQSS locality CPU 0 to 1: 2
MuQSS locality CPU 0 to 2: 2
MuQSS locality CPU 0 to 3: 2
MuQSS locality CPU 0 to 4: 2
MuQSS locality CPU 0 to 5: 2
MuQSS locality CPU 0 to 6: 2
MuQSS locality CPU 0 to 7: 2
MuQSS locality CPU 0 to 8: 1
MuQSS locality CPU 0 to 9: 2
MuQSS locality CPU 0 to 10: 2
MuQSS locality CPU 0 to 11: 2
MuQSS locality CPU 0 to 12: 2
MuQSS locality CPU 0 to 13: 2
MuQSS locality CPU 0 to 14: 2
MuQSS locality CPU 0 to 15: 2
MuQSS CPU 15 llc 0 CPU order 4 RQ 13 llc 0
MuQSS CPU 15 llc 0 CPU order 5 RQ 5 llc 0
MuQSS CPU 15 llc 0 CPU order 6 RQ 12 llc 0
MuQSS CPU 15 llc 0 CPU order 7 RQ 4 llc 0
MuQSS CPU 15 llc 0 CPU order 8 RQ 11 llc 0
MuQSS CPU 15 llc 0 CPU order 9 RQ 3 llc 0
MuQSS CPU 15 llc 0 CPU order 10 RQ 10 llc 0
MuQSS CPU 15 llc 0 CPU order 11 RQ 2 llc 0
MuQSS CPU 15 llc 0 CPU order 12 RQ 9 llc 0
MuQSS CPU 15 llc 0 CPU order 13 RQ 1 llc 0
MuQSS CPU 15 llc 0 CPU order 14 RQ 8 llc 0
MuQSS CPU 15 llc 0 CPU order 15 RQ 0 llc 0
MuQSS runqueue share type LLC total runqueues: 1
Back to top
All times are GMT - 8 Hours