Tech

klaxian

After more research, I think there is still a problem here. Even though resetting my cpufreq up_threshold helped a little with power usage, there is still more CPU usage than I would expect and cores are still frequently entering high power states.

Before kernel 4.10, kworker and irq/nvidia/irqsoftbalance processes barely use any CPU (<1%). Starting with kernel 4.10, those processes are consistently using about 10-20% CPU at idle - much more than any other processes. CPU states and power consumption confirm that this isn't just a difference of accounting. Stopping my desktop manager eliminates the load from irq/nvidia (as expected), but not the kworkers. Everything seems normal in /proc/interrupts.

Any idea what the newer kernels are doing that's using so many CPU resources? or is there any way I can find out? Disk, network, memory, etc. usage are unchanged between kernel versions.

I'm beginning to think this issue is not related to the scheduler at all...
Back to top

klaxian

Ubuntu's packaged 4.10 kernel does not have this problem. That must mean that one of the zen/liquorix patches is causing the extra CPU usage from kworker and irq processes. Let me know if there's anything else I can do to help. For now, I have to stay on liquorix 4.9.
Back to top

damentz

What you're seeing is the ondemand configuration of Liquorix keeping your CPUs awake, plus the threaded IRQ configuration that might be affecting it.

Not only is the up_threshold for ondemand on MuQSS set to 45, but the down sampling factor is set to 10. This was the only configuration I could use that would let certain types of applications, like Dolphin (Gamecube and Wii emulator), actually run at full speed. MuQSS scheduling policy would never let the process run for long enough on a single core for ondemand to notice the higher CPU and to raise the frequency, causing a 25-50% drop in performance.

Con in his MuQSS v156 patch changed the way time is counted, probably affecting ondemand as well. But, what's probably also affecting what you're seeing is the hrtime and timeout patches that were added recently: github.com/zen-kernel/zen-kernel/commit/40bae85bf1a7684c79bd12999d3464e22c6d7c6c

:: Code ::

muqss: Add CK patches for high res timers
Source:http://ck.kolivas.org/patches/4.0/4.11/4.11-ck1/patches/

Patches:
0004-Create-highres-timeout-variants-of-schedule_timeout-.patch
0005-Special-case-calls-of-schedule_timeout-1-to-use-the-.patch
0006-Convert-msleep-to-use-hrtimers-when-active.patch
0007-Replace-all-schedule-timeout-1-with-schedule_min_hrt.patch
0008-Replace-all-calls-to-schedule_timeout_interruptible-.patch
0009-Replace-all-calls-to-schedule_timeout_uninterruptibl.patch
0010-Make-hrtimer-granularity-and-minimum-hrtimeout-confi.patch
0011-Don-t-use-hrtimer-overlay-when-pm_freezing-since-som.patch
0012-Make-threaded-IRQs-optionally-the-default-which-can-.patch

I'll need to experiment, but it seems like we need to turn off threaded IRQs since they keep the system awake and burn more power. And with the time accounting fixes, maybe ondemand can behave more similar to how CFS does currently, and it's safe to make ondemand more resistant to increasing CPU frequency (and more likely to reduce cpu frequency).
Back to top

klaxian

While some tuning of the ondemand governor might help some things, I don't think that's the real problem here anymore. My kworker processes are actually using way more CPU cycles than in earlier or stock kernels (not just accounting changes), regardless of the cpufreq settings. I'm not sure what this kernel is doing, but it's using ~20% of all cores at idle. It might be related to some MuQSS computations, but MuQSS might not be the culprit at all. Disabling IRQ threads in 4.11.0-3.2 had no effect on this issue so I suggest enabling them again.

I also suggest changing the cpufreq/ondemand default tunables back to the way they were before 4.11.0-3.2. First, something (maybe MuQSS) is overriding your up_threshold anyway. It still ends up at 45 by default. Ubuntu derivatives boot with the governor set to "performance" and then change to "ondemand" afterward. That might be resetting your values. I had up_threshold set to 50 in older kernels and I had found it a good mix of responsiveness and power savings. Changing it to 1 makes cores ramp to max speed whenever there is even the slightest load. They end up in P0 all the time. I'd leave it at 45.

Next, the old sampling_down_factor value of 10 lessens the amount of load checks during the highest CPU power states. This lowers overhead in high-load situations and prevents the CPU from clocking down as often when it is at nearly 100% utilization. There might be some slight power savings from your change, but it can also lower both throughput and responsiveness. I suggest reverting back to 10 for sampling_down_factor.

www.kernel.org/doc/Documentation/cpu-freq/governors.txt

As for my specific issue of kernel CPU usage, do you know how I can find out what the kworker threads are actually doing that is using so much CPU in the newer liquorix kernels? Obviously, strace wouldn't work in this case. Liquorix <= 4.9 and Ubuntu 4.10 kernels are not affected. Is there anything else I can test or other information that would help?

Thanks for sticking with this!
Back to top

klaxian

Unfortunately, this problem still exists with the latest kernel, liquorix 4.11.0-6.1 . I'm surprised that this isn't more widespread, but perhaps it is something specific to my hardware? Regardless, this does not occur on liquorix <= 4.9 nor the Ubuntu kernel 4.10+. As posted, this is not simply an accounting issue and I don't think cpufreq is responsible. There are kworker processes that are actually using significant CPU cycles, even at idle. This may or may not be related to MuQSS. Is there anything I can do to help track down the problem? In the meantime, I am forced to remain on liquorix 4.9. Thanks for your help.
Back to top

gelabs

The same behaviour seems to be present again on 4.14.0-20.3-liquorix-amd64.
Load average is back to 1.00 when idle.

06:19:58 up 13:58, 2 users, load average: 1,00, 1,00, 1,00
Back to top

damentz

Seems that you're right. Since Con released MuQSS for 4.15, I'll focus on moving to that instead, and maybe revert the rqshare patches for 4.14.
Back to top

damentz

Updated 4.14 and 4.15 kernels are out in the primary repo, Launchpad for Ubuntu branches is still building. Let me know if the updates don't help.
Back to top

gelabs

Unfortunately, still the same behaviour with 4.15.0-5.1-liquorix-amd64.
Load is fluctuating for a while but ends up to 1 about an hour later.

Edit: not really 1.00, now that seems to be about 0.80/0.85
Back to top

gelabs

Still here with 4.15.0-6.1-liquorix-amd64

07:00:11 up 1 day, 7 min, 3 users, load average: 1,00, 1,00, 1,00
Back to top

Tech

Tech

patterns.com

tech forums