Linux kernel virtual memory and pages
techAdmin
Status: Site Admin
Joined: 26 Sep 2003
Posts: 4129
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
Note, this was originally meant as a response to an antix forum thread: www.antixforum.com/forums/topic/what-are-you-here-with-today/page/85/#post-104513 but their spam filter ate it, so I am posting it here, the question is about why different tools, like inxi, free, etc, show different memory used results for some other tool]

I can't give you the real story about memory use, but I can give a faint and probably off by a bit overview of it.

Every tool that deals with memory is making decisions about what to consider memory and what to ignore.

The kernel creates 'pages' which are some size, I forget how big: www.cs.rpi.edu/academics/courses/fall04/os/c12/

:: Quote ::
Pages are typically 512 to 8192 bytes, with 4096 being a typical value. Page size is virtually always a power of two, for reasons to be explained below. Loadable modules are also divided into a number of page frames. Page frames are always the same size as the pages in memory.


There is nothing subjective about memory, it' snot metaphysics,but it is extremely ill defined and basically no tool reports with crystal clarity why it made the decision it made. inxi is no exception to this, it uses what is considered a reasonable metric to determine memory use.

Understanding how the kernel actually works with virtual memory is basically understanding how the kernel works, which is why very few non kernel people, including me, really understand it.

As you can see from the above quote, the page size can vary quite widely, and is I believe a tunable kernel parameter, so there is no absolute there, you'd assign different page sizes based on the requirements of your application/server/machine/system.

So you see sort of a first place to locate a possible ground for reporting error, assuming a page size say of 512 Bytes, when an actual page is say 4096 bytes. I am aware of this pitfall because I have spent a lot of time trying to determine categoricially what size the data in say, sys, or the reports in /proc, might use for say, storage blocks. In that case, I was finally able to find kernel docs that made it clear that while it appeared possible to have different units, what the kernel actually used internally was 512 bytes per unit, I think if I remember right.

That was for block devices, but that issue exists everywhere, is the size of the unit variable, or is it absolute?

Say you count how many pages there are, is the software assuming a certain size that in fact is not constant? or is the size constant?

This is just on the raw page size area, which I can't remember looking into, because I think the tools I use to get that data already have done that work, so I didn't have to worry about it.

Looking at the inxi class MemoryData, we note it's 240 lines long, though it has to handle several different BSD methods, and Linux. Happily, inxi only needs one single method for linux, /proc/meminfo, which is uncommon. The kernel guys were nice, and politely added the active units, kB, which is kilobyte, aka,1000 bytes 1 KiB == 1024 bytes, but 1 kB == 1000 bytes, ouch.

This is the dreaded is it KiB or kB or KB ambiguity. However, looking below, I realize they are delivering this not in standard base 2 numbering, but in base 10. I think.

note: KB/KiB == 1024 = 2^10 bytes, and kB = 1000 Bytes.

So the units are almost known, and we benefit from the kernel guys caring enough to handle that for us under the covers. Note that this handling can never be assumed, and you have to know where the tool gets the data from.

Now, when you run: cat /proc/meminfo, you get a spray of different values, tons of them, globs.

A few are apparently non ambiguous, except they aren't: MemTotal is how much memory the system has to use.

For example:
:: Code ::
inxi -Ix
Memory: 31.28 GiB used: 22.47 GiB (71.9%)

cat /proc/meminfo
...
`MemTotal:       32794220 kB


So inxi is not tricked, it knows it's looking at kB, not KiB, so 32794220 kB is converted to 31.28 GiB, because inxi internally uses KiB as the standard size unit, and everything is converted to that to make it consistent.

So this is an area where small errors can creep in, if kB units were treated as KiB units. Not a huge error, but substantial, as you can see, with mine, my total ram would be reported as 32.8 GiB if this were not corrected and translated.

The interesting fields in meminfo for inxi are:
MemTotal, MemFree, Buffers, Cached, MemAvailable.

Not all of these are certain to be there, and inxi is trusting the kernel's view of what is considered what.

MemAvailable is preferred, that's the available, aka, free, memory.
MemAvailable: 9206616 kB
so: 32794220 - 9206616 = 23587604

if that is not available, inxi will do some math:
$total - ($free + $buffers + $cached)

which would be:
32794220 - (3581780 + 698708 + 3312180) = 25201552

You'll note the two are not exactly the same, that's because the first is what the kernel has decided is available, and the second is some simple math which is probably missing a few values to subtract, but it's close enough.

so now we are roughly at the meat of the question:

where does inxi gets this data from? it gets it directly from the kernel, and agrees not to disagree with how the kernel guys have decided to determine what ram is available or not available.

However, there are other ways of handling virtual memory page files, for example, the kernel can and does pile on pages, with the assumption that it's always faster to access something from memory than by reading some file or doing some io, so it just leaves the pages ready for use even if they are not active, and then if it needs those addresses back, it will start overwriting the page with fresh data.

This is sort of what the buffers+cached means, if I remember right.

So some utilities will decide to only report a subset of the ram actually in use at that exact moment, the stuff

:: Code ::
free
               total        used        free      shared  buff/cache   available
Mem:        32794220    23545212     3631288      772136     6859056     9249008
Swap:       16383996     3665120    12718876


As you can see, basically the 'used' is taken from used + cached + buffered, which is the inxi fallback test to use too, though it uses the extra field of memAvail to indicate what is remaining from the total.

But there are lots of other ways of doing it.

For example, you can ignore the buffered and cached memory, which is probably what is happening when a utility reports dramatically different ram use results, but that means it's using a different and not really standard approach, inxi uses what the kernel uses, and what free uses, which is the same thing. I believe the memAvailable/available item is simply the non-assigned, non cached, non buffered ram, keeping in mind the kernel always tries to use as much ram as possible, which inxi also does by the way, since it's the fastest way to access and store data short term, except for CPU cache, of course.

There are entire books written about kernel virtual memory and pages, so I am not going to pretend I fully understand the topic, I did have an acquaintance who actually studied this exact topic under the main FreeBSD kernel guy, but the information never seeped out of his brain and into mine, because it's non trivial, and really of interest only if you are going to be working with the stuff directly.

However, if you see a program that differes widely from free, /proc/meminfo, and thus, from inxi, you can be almost certain that they are using a much more resitrictive idea of what memory in use is. The real idea is, if you have stored a page, and it occupies memory, even if it's only being stored in off chance something will request it again, that memory is in fact being used, even though that cache/buffer could be cleared and reused by something else that needs it now, not in the future.

The above article is not a bad read, and roughly explains what is going on.

But basically it's a fairly significant abuse of the term 'used' vs 'free' to ignore pages of memory that exist, are assigned, and available to the kernel, but which maybe exist only as non active buffers or caches.

It's kind of like having a glass of water on the counter or table and not counting it as filled because you can refill the glass with juice or whiskey at any moment.

I believe the point of pages is that the page can be located at various addresses, it can be in cpu cache, ram, or on a swap partition, so when the kernel requests that page, the lookup table will know it's actual location and address.

Almost 100% certain that I have some of this wrong, but that's roughly how it works, my advice is suspect the sources of data for the tool that gies significantly different results, particularly if they are much lower.

But they are not wrong, they are just treating the concept of used vs available as fundamentally different from how they are treated in free and meminfo. This is useful in many cases, for example, if you need to know the current active pages only, and don't need to worry about buffers or caches etc, then that is also useful information if you understand the difference. I believe this is why long ago the kernel guys came up with the available/memAvailable items, those mean that the memory is free and available and is not being used for anything else, basically. Some could be used if required, but it is used now at the moment of the report.

There's also another issue, which is far thornier, a cpu that shares ram with a gpu, like an all in one apu or a video card that uses system ram not gpu ram, these will often change the ram total, or the ram used, it depends, without you being able to access that data.

Raspberry pi is the only cpu I know of that reports the gpu ram used as an available data type, which inxi will use, but no other gpu ram use is exposed that I know of, which is why if you run inxi on a pi, you'll see this new data item, for ram, gpu:.

Anyway, that's roughly how it works, give or take many errors on my part. I'd better leave it at that before I make a real error in explaining it.
Back to top
Display posts from previous:   

All times are GMT - 8 Hours