Tech

techAdmin

[The following was posted by: ethanay but lost due to posting bug, techAdmin]

I've been researching performance tuning of Pop!_OS for lowlatency audio production, which you can somewhat follow here: pop-planet.info/forums/threads/pro-audio-pop.249

This is the first time I\'ve seen anyone else discuss ALL THREE real-world performance parameters of energy efficiency, throughput AND latency (pick any two), and the compromises between them.

My understanding of Liquorix is that it is the first (and only?) kernel to do what Mac OS X set out to accomplish in its earliest days ( www.cse.unsw.edu.au/~cs9242/10/lectures/09-OSXAudiox4.pdf ) by making small sacrifices to throughput performance to create relatively significant gains in latency performance without necessarily sacrificing energy efficiency. Is this correct?

I believe the -generic kernel line is slowly (and somewhat-begrudgingly) headed in this direction.

If this is all true, it sure is nice to see this sort of leadership that finally prioritizes user experience and psychology over the reductionist throughput benchmark wars that seem to be driving system design and optimization!

Are there any distros that use Liquorix by default? Any meaningful benchmarks or performance notes and comparisons compared to the -generic and -lowlatency lines? I\'ve already seen www.phoronix.com/scan.php?page=article&item=linux414-lowlatency-liquorix&num=1 and it seems as irrelevant as most other benchmarks

I am making some recommendations to System76 and am wondering the extent to which Liquorix or a similarly-tuned kernel could replace linux-generic AND linux-lowlatency until mainstream kernel development and system performance tuning catches up to the philosophy that "latency matters, too!" and that small sacrifices in throughput result in big gains for MOST user experiences and workloads in MOST cases, as Mac engineers and users have enjoyed for 2 decades now.
Back to top

techAdmin

Speaking only from what I've seen, the main issue for frozen pool distros (like Ubuntu, Debian Stables) and Liquorix is non free video drivers, if the users use the distro packaged version, which are made to work with the frozen kernel version, it's not uncommon for users to have issues. That's nvidia mostly.

For systems that have free graphics drivers that's not going to be an issue, but it still appears with things like vbox, for example, as well. Anything that needs kernel modules built on updates of the kernels can create this issue. Rolling release distros like Arch and Arch derived in general are not going to see this type of issue as much since their packages that require module rebuilds are usually going to be reasonably up to date with the current Linux kernel.

Otherwise, interesting obvervations, thanks
Back to top

ethanay

That's wonderful, thank you! Much appreciated.

Yes, so glad to find allies in this analysis. The obsession over throughput needs to stop. It provides absolutely no benefit in the vast majority of cases. Looking forward to discussing further and figuring out how to bring these views out to more public light. Maybe we can shake things up a little bit and at least bring some more attention to the problem.

I'm just an end user, I don't even do any programming but I will help however I can.

I would really just like to start with how we might help raise awareness and educate people. I'm not sure how supportive System76 will be but they have been receptive to discussing thus far...

IIRC, Pop!_OS actually is a hybrid of frozen pool and rolling updates. And because System76 has full control of hardware, it also provides hotfixes for many of the known issues that people face when installing distros on random hardware or hardware with proprietary drivers. As long as their hardware resembles the sort of hardware that S76 uses, many people have benefited from the additional hardware support that S76/Pop!_OS have provided.

It would be a dream to facilitate a collaboration and at least test, if not the Liquorix kernel itself, then a Liquorix-like kernel with Pop!_OS.

I think there is also a lot of marketing potential to turn heads and attract attention, too, if a company like S76 would vocally adopt and support such a project in some way.

These are just ideas...I am still waiting for Pop!_Planet's wiki to become functional (it's not currently editable) so I can post my research to the wiki in a proper format, rather than messily splitting it up into forum posts. But if we truly are on the same page here about system performance tuning, I'd like to explore how we can collaborate further.

Maybe we could start with a white paper or something. And I'd like to bring / push the case to people who have the authority to make some decisions @ S76 whenever we feel like that is ready. S76 claims to target itself at "makers" and even the name of the company is about shaking things up (reference to the year 1776). So hopefully they will be game to explore this with us. We shall see!

But even if we just get this issue on more people's radars and increase general understanding of comprehensive or holistic performance tuning, that will be a win for computing and bring attention back to the work being done here.
Back to top

ethanay

Here are my thoughts on how we might proceed...

1. Develop White paper: "The Case for Lower Throughput Performance" (provocative title)

2. Post broadly to engage Linux Audio and general / Pop!_OS community

3. Engage S76 as a company
1. Collaborate to integrate these performance optimizations into S76 computers / Pop!_OS
2. Will bring GNU/Linux, S76 and Pop!_OS closer to Mac OS X in multimedia performance out of the box, and provide benefits to non-multimedia users as well (competitive advantage)
3. Marketing opportunity in potential alignment with S76 philosophy and mission
provocative turn away from mainline kernel to include a "low latency" tuned kernel in the OS by default and allow -generic as a backup for people who need the extra small throughput performance gain (e.g,. people who do a lot of compiling source code)

This post also references the same philosophy: https://unix.stackexchange.com/a/564488/361160 although I don't see any way to get in touch with the authors!
Back to top

techAdmin

It's easy to understand where certain biases kernel devs have come from. Almost all of them work around servers in some capacity. IO throughput is critical in that, so that bias tends to reflect in most kernel devs attitudes towards the kind of stuff liquorix tries to do.

I can as a very long time Liquorix user, and early adopter, however, note a few things: Liquorix is a cutting edge kernel, and the areas I've personally hit real issues most often over the years have been around long IO operations, like huge disk writes, usb updates, etc. Sometimes the tools used internally in the kernel to yield the desktop performance optimizations have resulted in the kernel literally 'forgetting' about the long process. This only happens now and then, it depends on which scheduler is being used at the time if I remember right.

I think the key for advocating something is in using it a lot yourself, and finding that yes, in fact, it does do exactly what you want it to do, then being able to define in real ways what that something is, and how it feels, what the subjective differences are, etc. The best way is obviously to get people to try it, lol, and then if they find something or other that may have annoyed them with generic kernels, they tend to like Liquorix. Some people don't like it, for various reasons, it really varies.

I couldn't personally tell you any differences, I almost always use liquorix on my main systems, but I'm not the type of user who can then tell you what it's advantages are, I report real bugs when I find them, which is not that often, but they do happen, otherwise in general I'm a satisfied user, but I can't personally make claims about comparing it to other kernels, though maybe if I used other kernels about half of every year and actually looked for desktop interaction differences I could say something or other. I wouldn't use Mac OSX as an example personally, maybe it does feature these types of kernel optimizations, no idea, never feels very snappy to me when I use it, but I don't use it very often.
Back to top

damentz

System76 is definitely in a unique position to spin a kernel for their own Pop_OS! distribution. Because they sell hardware that they put their distribution on, they only need to guarantee that their kernel will work with their hardware. If the kernel works on hardware they don't support, well that's just bonus for everyone else.

This is a huge issue with Liquorix as you can imagine. I only have my own hardware to test on personally so issues crop up that I can't reproduce locally. Just as an example [1], a recent update to MuQSS interferes with the boot procedure on some hardware configurations, requiring custom boot parameters in some instances, and in others the hardware doesn't work at all.

You're entirely correct about throughput versus latency. Perceived performance is a combination of both throughput and latency. For the most part, upstream Linux has throughput in the bag. In nearly every benchmark on Phoronix comparing Windows and Linux, Linux always wins. That's because nearly every benchmark is measuring in some way, how long it takes to complete a task or how many iterations of a task can be done within a given amount of time.

And ironically, on an interactive system like your desktop or laptop, all those benchmarks don't really matter. When someone measures how fast their system is, they're almost always talking about the consistency of the experience. This is exactly the problem that BFS/MuQSS was designed to solve.

Lets say you had two configurations on your system that you use daily to make money:

1) An operating system that lets you run your programs at full speed. Video encoding jobs will complete sooner and video games will play at higher frame rates, but you can only do realistically one thing on the machine at a time because the experience suffers when two demanding programs are running at once.

2) An operating system that provides a consistent experience when multiple demanding applications are running at once, but your video encoding jobs take longer and games run at a lower average FPS (but have a lower frame time deviation). The trade-off is that the machine is completely usable for multitasking despite the added time for batch tasks to complete, and the lower average frame rate in video games.

Which operating systems do you think I'm describing here? It's Linux and MacOS (or in the distance Windows).

If you could imagine a world where software access is equal between MacOS and Linux, I think a large number of professionals would still stick to Apple's products. The behavior Apple programmed into their kernel when the operating system is under adverse conditions is far more optimal than Linux.

This is a long way of me saying that I agree that System76 should pursue a new default kernel that's optimized for the experience a workstation user expects. I think that Liquorix does a decent job at tuning Linux to behave more consistently under adverse conditions, but I'm limited by my time, experience with C, and costs I'm willing to pay out of my pocket to make Liquorix far better.

[1] github.com/damentz/liquorix-package/issues/29
Back to top

ethanay

:: Quote ::

1) An operating system that lets you run your programs at full speed. Video encoding jobs will complete sooner and video games will play at higher frame rates, but you can only do realistically one thing on the machine at a time because the experience suffers when two demanding programs are running at once.

2) An operating system that provides a consistent experience when multiple demanding applications are running at once, but your video encoding jobs take longer and games run at a lower average FPS (but have a lower frame time deviation). The trade-off is that the machine is completely usable for multitasking despite the added time for batch tasks to complete, and the lower average frame rate in video games.

So my next question is: does this relate to low latency audio performance? (ref the "glitch free audio" pdf I link to in my first post) Does OS #2 above inherently provide lower latency and lower jitter compared to system #1? My intuition says yes, but I can't back that up right now (although it is late and I'm tired and maybe I am missing obvious connections). I am trying to make an argument that, to a certain extent, tuning a computer to perform well under load with glitch-free low latency audio will also as a byproduct create system #2. But I want to test that this is the case. If so, it is a powerful argument to make as it covers both a very high-value professional use case (audio/multimedia production) and general improvement to the user experience and user psychology.

:: Quote ::

And ironically, on an interactive system like your desktop or laptop, all those benchmarks don't really matter. When someone measures how fast their system is, they're almost always talking about the consistency of the experience. This is exactly the problem that BFS/MuQSS was designed to solve.

I wonder how those schedulers perform in low-latency multimedia contexts. I only have experience using the -lowlatency kernel and, several years ago, an -rt kernel.

:: Quote ::

If you could imagine a world where software access is equal between MacOS and Linux, I think a large number of professionals would still stick to Apple's products. The behavior Apple programmed into their kernel when the operating system is under adverse conditions is far more optimal than Linux.

Yet, I don't hear anyone complaining about Apple Mac OS X lacking in throughput performance. In my experience, in real-world usage, these sorts of CPU-intensive finite tasks (such as compiling or transcoding) can take minutes (even hours) to complete. And that means 1 of two things: 1. People get up and walk away from the computer (because the computer is unusable or they have other non-computer things to do) or 2. People task switch while processing happens in the background, in which case they want to use the computer without it lagging or stuttering, even if that means that the background task takes a little longer to complete. I don't hear people complaining in the real world when it takes 20 minutes instead of 15-18 min to compile or 3 min instead of 2:30min to transcode (and I think those throughput differences may be slightly exaggerated), but I hear complaints all the time when the computer stutters under the compilation load and they aren't able to type notes or send emails while the CPU is crunching numbers in the background.

So I think we are making an argument that a computer manufacturer/OS designer can respect its general users more by incorporating user psychology into its performance tuning, on one hand, and as a byproduct will also help the computer perform better in low latency audio. And on the other hand, I also want to make the argument (as a low latency multimedia user) that certain performance tuning "out of the box" for reliable, stable low latency audio performance will also improve the general user experience.

I received a response back from Harrison Mixbus developers about optimizations they would like to see on a Linux system, which you can view here: https://pop-planet.info/forums/threads/pro-audio-pop.249/post-3242

After the SMI hardware selection/testing component, the software arena of optimizations I think fall under the following categories:

provide better low latency performance out of the box with no notable general user performance regression (slight tradeoffs in throughput are OK, as Mac OS X has demonstrated)
enhance general user experience (eg responsiveness at the sacrifice of small amounts of throughput) or
make it easier for users to do any remaining system tuning themselves (odds and ends), especially those that end up in greater performance tradeoffs, such as reduced battery life or notable drops in throughput performance.

So it is just a matter of binning various optimizations under these different categories. I think if you are willing and able you could be of immense help in categorizing optimizations such as those recommended by Harrison Mixbus devs according to the above. I also think your perspective could be invaluable for constructing more accurate recommendations for performance tuning, if you are willing to serve in an advisory capacity.

But first is making a clear and concise case about the importance of this issue. I think we have all the arguments, but I have not made them in a clear and concise way. One of the arguments I make is that while multimedia users are a relatively small % of the overall user base, multimedia applications and hardware monetize disproportionately. So there is a market incentive. For example, in the Apple store, while multimedia production applications are a small number of total apps, they make up a disproportionately large part of the overall app revenue (I found a great analysis of this about a year ago but am kicking myself I didn't save the link apparently).[/quote]
Back to top

ethanay

:: Quote ::

The trade-off is that the machine is completely usable for multitasking despite the added time for batch tasks to complete, and the lower average frame rate in video games.

I also want to highlight this question. How much lower throughput performance? And how much lower is acceptable in terms of user psychology? I think the task completion and framerates are useful metaphors. There are some thresholds in there for us to worry about. For example, if FPS decrease from 60fps to 45fps, that is a non-issue. Likewise, if it takes 5 seconds instead of 4 seconds to start a program (or 25 minutes instead of 20 minutes to compile or transcode something), that is a non-issue. Again, I think this has more to do with user psychology than performance benchmarks. In these contexts, even a "25% performance hit" means nothing bad for the user, especially if it results in a more responsive system capable of handling tasks with greater attention to latency and lower jitter.

A stable 30fps may be preferable to an unstable 40fps. Likewise, the threshold for latency seems to be <=10-15ms round trip. These both relate to psychological and physical limitations. I am unaware of similar psychological or physical limitations for throughput, though I do not doubt they exist. I suspect we have remained well within them. Douglas Adams made fun of that with the bit in HGG about "the answer" to "the question" asked untold years before, crunched over thousands of years, the answer it spat out was 42. But by that time no one remembered what question they had originally asked, so the answer was meaningless.

There are niche cases where that performance matters, where someone is compiling and recompiling many times a day as part of their workflow, and the number of times or amount of transcoding or compiling they can get done in a given timeframe matters for their overall productivity, in theory. But the vanilla kernel already handles those use cases very nicely, even though they are niche use cases that don't reflect how most people use their computers and the performance they expect out of their computers.

Especially considering how fast modern processors are today. My i5 can transcode in a fraction of the time it took my Core2Duo to transcode.
Back to top

techAdmin

I'm getting the feeling you are confusing two distinctly different things here. Audio processing, as in recording, live etc, is almost the exact opposite use case from what Liquorix strives to achieve. Generally when it comes to real audio recording, people talk about real time kernels.

I don't follow this closely, but it looks to me like you may be confusing overall desktop fluidity and 'feel' with some strict throughput concepts, particularly with pro audio recording. These two things are I believe not compatible, as the examples damentz provided showed fairly clearly.

If you want nothing to interupt a stream of data, as in no jitter audio recording, then that's the opposite use case liquorix and actually you are talking about, so that's a case of wanting to have ones cake and eat it too, that is, pick one, you can't have both.

Your question re 'how much lower throughput performance' to me is one of the keys of the liquorix kernel, I've watched damentz experiment with that for years, it may be one of the main factors actually. As I noted initially, I've seen these tests result in kernels that literally 'forgot' about long I/O operations because the scheduler basically just lost sight of it. Obviously, that's too far in terms of balance away from I/O throughput, so those kernels don't in general last more than 1 release.

I can tell you as a software person that if you specificy two totally incompatible core requirements for a kernel, you end up with Windows Vista, for example, you have to understand what contradicts what.

Read up on audio and realtime kernels, I stopped following that stuff, but I know when I was following it, everyone working with real audio projects as in live recording was focused on realtime kernels to get rid of the jitter you sometimes will see when the kernel gets distracted by another process that needs attention.

While I'm not speaking for liquorix, i can roughly say that it's the balance that is the project, and one thing can never be all things to all users, so make sure you aren't wanting certain things to happen that make other things impossible.

But whatever you do, don't try to specify things that will make each other fail,you have to be clear on what you are looking for, and in this case, you are looking at desktop feel, latency, etc, NOT at I/O throughput on a strict level.
Back to top

ethanay

Interesting, thank you for your thoughts. I was under the impression that there is significant overlap between the needs of desktop responsiveness, low jitter and audio I/O reliability, as they all fall under the notion of prioritizing human interaction (audio, video and input device) over throughput work threads.

I know that several Linux-based audio users have reported using Liquorix with good results (e.g., AV Linux).

So I am surprised to hear that you think that Liquorix is almost exactly the opposite scenario I am describing.

Within the realm of generally prioritizing system responsiveness (especially under load), there will be conflict. Apple puts audio threads first, as that PDF I linked to mentions, which I think is a sound decision (no pun intended), but I am under the impression that audio, visual and HID responsiveness can all be improved at the expense of throughput. I am not under the impression that they can all be improved EQUALLY. And I think it makes sense to prioritize audio i/o over video and interface devices (e.g., mouse and keyboard), for psychological reasons.

So I thought we are talking about priorities, not outright performance conflicts. An example of the latter, IMO, would be, "I want a low latency system that has the highest possible throughput and lowest energy use." Something's gotta give.

Within the realm of user interaction, there are design decisions that need to be made about which threads get the highest priority. Users are very sensitive to disruptions in audio, slightly less sensitive to disruptions in video (again this depends on usage context), and slightly less sensitive to disruptions in mouse/keyboard input, as those are usually entered in response to auditory or visual stimuli.

In terms of realtime kernels, my understanding is that they are becoming increasingly rare and niche because a. the -generic kernel line has improved in its responsiveness tuning, especially under workload and b. people are beginning to understand that the extreme sacrifices of a "realtime kernel" are, like your other example, still a bit too far. The -lowlatency kernel line has survived because it pushes performance just far enough away from an emphasis on throughput without losing a lot of the benefits of the -generic kernel line.

No-jitter is not a realistic goal. Nor is "realtime audio." What I think we are finding is that we only need to keep things within the human limits of psychological imperceptibility, which seems to be about 10ms or so:

Audio: whirlwindusa.com/support/tech-articles/opening-pandoras-box/
Visual: https://www.youtube.com/watch?v=vOvQCPLkPt4
Input: https://stackoverflow.com/questions/6880856/whats-the-minimum-lag-detectable-by-a-human#6880891

All these three elements need to tie together into a consistent latency that is <10-15ms round trip, from input to output.

For most users in most cases (where humans and computers are interacting), "realtime" does not mean "0 latency/0 jitter" but "sufficiently low latency and jitter so as to be negligible and imperceptible." Anything greater than that is just overkill for most applications, e.g., other than NASA AI making split-second microcorrections to a rocket trajectory to keep it stable, etc. But that is a completely different use case, where machines are interacting with and responding first and foremost to physical stimuli in their environment, rather than feeding or receiving information from humans.

Maybe I am way off base here, but the question to me a. is how we define these priorities in relation to each-other and b. in relation to other system processes. So that amounts to 4 different priority levels.

A very simplistic example for a single core processor handling the 3 input/output tasks would be to allocate sufficient resources to allow each type of stimulus to process in under an average <5ms, with low enough jitter so that that deviations in min/max are not perceptible, which is probably something along a 3-5ms deviation. Interface device input routing is relatively non-intensive compared to audio and video processing, so it probably doesn't need anywhere near the full 5ms, but also will occur much more frequently and thus risk interrupting the more intensive audio and video processing if it is given too high priority (worse case scenario: audio cuts out/xruns due to typing or moving the cursor). Maybe this looks like a <15ms total latency with priority given to audio, video and human input devices, in that order.

There is a philosophical question of whether a camera, for example, should be considered under the "video" or the "input device" category, and I believe it is the latter, because of the interactive dynamics: we respond to what we see/hear, so we, by definition, need to see and hear before we can respond. This is a bit different than, say, taking an audio interface and lumping it in with all audio threads at the highest priority.

I agree about different strokes for different folks -- having some accessible way for users to quickly and easily set these priorities according to their intended workflow makes the most sense. For example, someone might do audio recording and want all audio (including human audio interface devices) at the highest priority, but then they might do some word processing while listening to glitch-free audio in the background and want to avoid cursor lag.

Although I don't play video games anymore, I think they are a great test bed for system optimization for human interaction, as they have to deal with and balance all four of these scenarios all at once: audio, video, human input, and overall system performance/throughput. I see this happening in the audio world in terms of pro audio engineers regularly designing and building their own dedicated DAW computers using gaming equipment and even performance optimizations originally targeted at gamers.

I see a system optimized for audio as a derivative of this more general optimization, rather than the opposite goal. Thank you for the discussion! Look forward to hearing your thoughts. I am learning a lot through this process.
Back to top

Tech

Tech

patterns.com

tech forums