← Back to Home

A Kernel Inside Your Kernel — Complete Technical Guide

Linux Kernel Tuning

This page contains the complete technical reference for the video A Kernel Inside Your Kernel. Every parameter, every command, every configuration file — explained in full detail.

The goal is simple: your distribution made hundreds of kernel decisions for you. Most of those decisions were made for the average machine, the average workload, the average user. Your system is none of those things. This guide shows you how to take those decisions back.

📄 Written guide vs video
A video has to move. Explaining every nuance and edge case of every kernel parameter would take three hours. So in the video I kept some explanations intentionally simplified — clear enough to understand the concept, not complete enough to cover every situation. This written guide goes further. Where the video made a clean generalisation, this page adds the caveats, the alternative approaches, and the corrections. If something in the video sounded absolute, check the corresponding section here.

⚠️ IMPORTANT
Read before applying anything. These are not magic tweaks — they are trade-offs. Every parameter here has a cost. Understand what you are changing and why before touching your system. When in doubt, test on a non-critical machine first. The Arch Wiki and Gentoo Wiki are your best friends for deeper reading.


CONTENTS
  1. Runtime Parameters — sysctl
  2. Memory Management
    1. vm.swappiness
    2. vm.vfs_cache_pressure
    3. Dirty pages
    4. Transparent Huge Pages (THP)
    5. Kernel Same-page Merging (KSM)
    6. vm.max_map_count
    7. Pressure Stall Information (PSI)
  3. CPU and Scheduling
    1. Scheduler autogroup
    2. CPU frequency governors
    3. CPU security mitigations
    4. Tickless kernel and RCU
  4. Storage and I/O
    1. I/O schedulers
    2. Zswap
    3. Read-ahead
  5. Kernel Forensics — Reading the Signals
  6. Personal Baseline Configuration

1. Runtime Parameters — sysctl

The Linux kernel exposes hundreds of runtime parameters through the /proc/sys/ virtual filesystem. You can read and write these parameters without rebooting, without recompiling, without breaking anything. The tool for this is sysctl.

To see every active parameter on your system:

sysctl -a

To read a single parameter:

sysctl vm.swappiness

To change a parameter at runtime (temporary — lost on reboot):

sudo sysctl -w vm.swappiness=10

To make changes permanent, add them to:

/etc/sysctl.conf

Or drop a file into:

/etc/sysctl.d/99-custom.conf

After editing, apply without rebooting:

sudo sysctl --system

2. Memory Management

2.1 vm.swappiness

Controls how willing the kernel is to move process memory to swap. This is not a percentage threshold — it is a relative cost value. The kernel docs define it as the relative cost of reclaiming swap-backed memory versus reclaiming page cache. Higher values make the kernel more willing to swap process memory. Lower values favour keeping processes in RAM and reclaiming page cache instead.

Check the current value:

cat /proc/sys/vm/swappiness

Default: 60. The range is 0 to 200. The default value of 60 is a generic compromise, not a desktop-specific recommendation. On some desktop systems it can lead to more swapping than desired, but the optimal value depends on RAM size, workload, and swap configuration.

For many desktop systems, a lower value such as 10 can be a sensible starting point, but it should be treated as a baseline to test, not as a universal rule:

sudo sysctl -w vm.swappiness=10

Make it permanent:

echo "vm.swappiness=10" | sudo tee -a /etc/sysctl.d/99-custom.conf

Important nuances the video did not cover:

The point is not "10 is always better". The point is that 60 was chosen for a generic case that may not be yours. Measure, observe, decide.

2.2 vm.vfs_cache_pressure

Controls how aggressively the kernel reclaims memory used for filesystem metadata — directory entries (dentries) and inodes. Lower values keep more metadata cached in RAM. Higher values reclaim that cache more aggressively.

Check the current value:

cat /proc/sys/vm/vfs_cache_pressure

Default: 100. On a desktop with plenty of RAM, lowering this makes filesystem operations feel snappier — opening directories, listing files, searching.

sudo sysctl -w vm.vfs_cache_pressure=50

Make it permanent:

echo "vm.vfs_cache_pressure=50" | sudo tee -a /etc/sysctl.d/99-custom.conf

Going deeper than the video: on modern kernels, vm.vfs_cache_pressure should also be considered alongside vm.vfs_cache_pressure_denom, which defines the reference value used to interpret its aggressiveness. The effective reclaim ratio is pressure / pressure_denom. The denominator defaults to 100, so setting pressure to 50 gives a ratio of 0.5 — the kernel reclaims dentry/inode cache at half the normal rate. The value of 50 proposed here is correct with the default denominator, but it is worth knowing the denominator exists if you want finer control without touching the main value.

2.3 Dirty pages — vm.dirty_ratio and vm.dirty_background_ratio

When you write a file, data goes to a memory buffer first (dirty pages) and is flushed to disk later. These two parameters control when that flushing happens.

vm.dirty_ratio — maximum percentage of RAM that can contain dirty data before the kernel blocks new writes and forces a flush. Default is typically 20%.

vm.dirty_background_ratio — percentage at which background flushing begins quietly, without blocking applications.

On a machine with 32 GB RAM, 20% means over 6 GB of unflushed data. When the kernel finally flushes it all at once, disk activity spikes and the system stutters.

Lower both values for smoother, more predictable write behaviour:

sudo sysctl -w vm.dirty_ratio=10
sudo sysctl -w vm.dirty_background_ratio=5

Make permanent:

echo "vm.dirty_ratio=10" | sudo tee -a /etc/sysctl.d/99-custom.conf
echo "vm.dirty_background_ratio=5" | sudo tee -a /etc/sysctl.d/99-custom.conf

Going deeper than the video: percentage-based limits scale with your total RAM, which can be unpredictable on machines with a lot of memory. The kernel also offers fixed-byte equivalents: vm.dirty_bytes and vm.dirty_background_bytes. If you set these, the ratio variants are ignored. On a machine with 32 GB or more, fixed limits give you more control:

# Example: cap dirty data at 512 MB, start background flush at 128 MB
sudo sysctl -w vm.dirty_bytes=536870912
sudo sysctl -w vm.dirty_background_bytes=134217728

This is more work to calculate, but the behaviour is predictable regardless of how much RAM you add later. The ratio approach proposed in the video is a sensible starting point — just be aware the bytes variant exists and is often the better choice on high-RAM machines.

2.4 Transparent Huge Pages (THP)

Modern CPUs support multiple memory page sizes. Normal pages are 4 KB. Huge pages are 2 MB. Larger pages reduce TLB misses and can improve performance for memory-intensive workloads. THP lets the kernel use huge pages automatically, without applications needing to request them explicitly.

Check the current setting:

cat /sys/kernel/mm/transparent_hugepage/enabled

You will see one of: [always], [madvise], or [never]. The active value is shown in brackets.

The kernel documentation does not prescribe a universal recommendation — it exposes the mechanism and leaves the choice to you. In the video I suggested madvise as a reasonable default for desktop use because it avoids the defragmentation overhead of always while still letting applications that genuinely benefit from huge pages (databases, VMs) request them explicitly. It is a prudent baseline, not an absolute rule.

To set madvise at runtime:

echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled

To make this permanent, add to your bootloader kernel parameters. With GRUB, edit /etc/default/grub and add to GRUB_CMDLINE_LINUX_DEFAULT:

transparent_hugepage=madvise

Then regenerate the GRUB configuration. The command depends on your distribution:

# Debian / Ubuntu
sudo update-grub
 
# Arch Linux
sudo grub-mkconfig -o /boot/grub/grub.cfg
 
# Fedora / RHEL
sudo grub2-mkconfig -o /boot/grub2/grub.cfg
 
# Void Linux (if using GRUB)
sudo grub-mkconfig -o /boot/grub/grub.cfg

2.5 Kernel Same-page Merging (KSM)

KSM is memory deduplication at the kernel level. When multiple processes have identical memory pages, the kernel merges them into a single physical page. Useful if you run many virtual machines with similar operating systems.

Check if KSM is running:

cat /sys/kernel/mm/ksm/run

0 = disabled, 1 = enabled. To enable:

echo 1 | sudo tee /sys/kernel/mm/ksm/run

Trade-off: KSM uses CPU time to scan memory. On a desktop without VMs, the overhead is not worth it. Enable only if you run containers or virtual machines regularly.

2.6 vm.max_map_count

Maximum number of memory-mapped regions a single process is allowed to have. The default is conservative and causes silent failures in some workloads — Proton/Steam games, Elasticsearch, and large Java applications all hit this ceiling.

Check current value:

cat /proc/sys/vm/max_map_count

Increase it:

sudo sysctl -w vm.max_map_count=262144

Make permanent:

echo "vm.max_map_count=262144" | sudo tee -a /etc/sysctl.d/99-custom.conf

2.7 Pressure Stall Information (PSI)

PSI is kernel telemetry. It tells you exactly how much time processes spent waiting because of CPU, memory, or IO pressure. Not just "how much RAM is free" — but "how much time was actually lost to resource contention".

Check if PSI is available:

cat /proc/pressure/cpu
cat /proc/pressure/memory
cat /proc/pressure/io

The output shows avg10, avg60, and avg300 — percentage of time at least one process was stalled, averaged over 10 seconds, 1 minute, and 5 minutes. Use this for forensics when your system feels slow. Real data, not guesswork.


3. CPU and Scheduling

3.1 Scheduler autogroup

Autogroup changes how CPU time is distributed. Instead of treating every process equally, the kernel groups processes by terminal session. Each TTY session becomes a group and the scheduler gives equal time to each group — not to each individual process.

Without autogroup: a compile job spawning 100 processes competes with your video player on equal terms. The video stutters.

With autogroup: the compile job and the video player are two groups. Each gets a fair share. The compile continues, the video stays smooth.

Check if it is enabled:

cat /proc/sys/kernel/sched_autogroup_enabled

1 = enabled (default on most distributions). To disable for pure batch workloads:

sudo sysctl -w kernel.sched_autogroup_enabled=0

3.2 CPU frequency governors

The kernel controls CPU frequency through governors. Check the active governor:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

List available governors:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_governors

Available governors and their behaviour:

Switch all cores to schedutil:

for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
  echo schedutil | sudo tee $cpu
done

Going deeper than the video — intel_pstate on modern Intel hardware: the video presents governors as a simple menu of behaviours. That is accurate for the generic acpi-cpufreq driver, which is what most AMD systems and older Intel systems use. However, many modern Intel CPUs use the intel_pstate driver instead. You can check which driver is active:

cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_driver

If you see intel_pstate in active mode, the powersave and performance governors do not behave like their generic counterparts. In this mode, powersave actually implements dynamic scaling similar to schedutil — it is not locking the CPU at the minimum frequency. Only in passive mode (intel_cpufreq) do the generic governor descriptions apply fully. Check the mode:

cat /sys/devices/system/cpu/intel_pstate/status

The practical upshot: on a modern Intel laptop, the governor names are the same but the underlying behaviour is different. Do not assume "powersave = slow" on Intel hardware without checking first.

3.3 CPU security mitigations

The kernel includes mitigations for CPU vulnerabilities: Spectre, Meltdown, L1TF, MDS, and others. The performance cost of CPU security mitigations varies widely by processor generation and workload. In some cases the impact is small, while in syscall-heavy, I/O-heavy, or virtualization-heavy scenarios it can become more noticeable.

Check which mitigations are active on your CPU:

grep -r '' /sys/devices/system/cpu/vulnerabilities/

To disable all mitigations, add this to your kernel command line in /etc/default/grub:

mitigations=off

Then regenerate GRUB (command varies by distro — see the THP section above for the full list).

⚠️ This is not a tweak. This is a security trade-off. Do not disable mitigations if you run untrusted code, browse arbitrary websites, or share the machine with other users. On a single-user offline workstation running trusted software, the real-world risk is low. Know what you are giving up.

3.4 Tickless kernel (NO_HZ)

Traditional kernels interrupt the CPU at a fixed rate (100, 250, or 1000 times per second) to check if rescheduling is needed. Even when nothing is happening. This wastes power and prevents deep sleep states.

Modern kernels support tickless operation — interrupts only fire when actually needed. Check if your kernel supports it:

grep -i "nohz" /boot/config-$(uname -r)

Look for:

This is a compile-time option — you cannot change it at runtime. If your kernel does not have it, you need to recompile (covered in Part 2).

3.5 RCU stall timeout

RCU (Read-Copy-Update) is a kernel synchronization mechanism. The CPU stall timeout controls how long the kernel waits before reporting that a CPU is stuck inside an RCU read section. Default is around 21 seconds.

For most users this is irrelevant. If you are doing kernel debugging or stress-testing scheduling, you may want to increase it via boot parameter:

rcupdate.rcu_cpu_stall_timeout=60

Add to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub.


4. Storage and I/O

4.1 I/O schedulers

The I/O scheduler controls the order in which the kernel processes read and write requests to your storage device. The right scheduler depends on your hardware.

Check the active scheduler for a device:

cat /sys/block/sda/queue/scheduler

Replace sda with your device name (nvme0n1, sdb, etc.).

Available schedulers and when to use them:

The kernel documentation does not declare a universal winner — the right scheduler depends on your hardware and what you are optimising for. BFQ is often a strong choice for desktop responsiveness, especially when interactive smoothness matters more than raw throughput, but it is not a universal winner on every SSD or workload. If you are on NVMe and care primarily about low latency with minimal overhead, kyber or even none may perform better. Test on your own hardware.

Switch at runtime (replace sda with your device):

echo bfq | sudo tee /sys/block/sda/queue/scheduler

To make it permanent, create a udev rule:

sudo nano /etc/udev/rules.d/60-ioscheduler.rules
# BFQ for SATA SSDs and HDDs
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="bfq"
ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq"
 
# Kyber for NVMe
ACTION=="add|change", KERNEL=="nvme[0-9]*", ATTR{queue/scheduler}="kyber"

Reload udev rules:

sudo udevadm control --reload-rules
sudo udevadm trigger

4.2 Zswap

Zswap is a compressed cache for swap pages. Instead of writing swapped memory straight to disk, the kernel first compresses it and keeps it in RAM. Only when that compressed cache fills up does the data actually go to disk.

Compressing in RAM and decompressing later is much faster than reading uncompressed data from disk — even from a fast SSD. The result is dramatically lower latency under memory pressure.

Check if Zswap is enabled:

cat /sys/module/zswap/parameters/enabled

Correction from the video: in the video I said Zswap must be enabled at boot via the kernel command line. That is not the full picture. Whether Zswap is on by default depends on the compile-time option CONFIG_ZSWAP_DEFAULT_ON. On many distributions it is already compiled in and active. You can also toggle it at runtime without rebooting:

# Enable at runtime
echo 1 | sudo tee /sys/module/zswap/parameters/enabled
 
# Disable at runtime
echo 0 | sudo tee /sys/module/zswap/parameters/enabled

The kernel command line parameter zswap.enabled=1 works as an override at boot and is still useful if you want to guarantee a specific state regardless of the compiled default. Add to GRUB_CMDLINE_LINUX_DEFAULT:

zswap.enabled=1 zswap.compressor=lz4

Then regenerate GRUB (see the THP section above for the command on your distribution).

Note on zpool — z3fold removed from baseline: the previous version of this guide suggested zswap.zpool=z3fold. z3fold has been deprecated in recent stable kernel releases. The current recommended pool allocator is zsmalloc, which is what the kernel uses by default when Zswap is enabled. You do not need to specify it explicitly — leaving it at the default is the correct choice today.

Available compression algorithms:

Combine Zswap with low swappiness for best results — but read the swappiness note above first. If you are using Zswap, swapping into compressed RAM is cheap. Your swappiness decision should reflect that.

4.3 Read-ahead

When you read a file, the kernel reads extra data beyond what you asked for — assuming you will need it next. This is called read-ahead. Measured in kilobytes.

Check current read-ahead for a device:

cat /sys/block/sda/queue/read_ahead_kb

On modern SSDs, sequential reads are extremely fast. Increasing read-ahead can improve performance for sequential workloads — loading large applications, playing video files, launching games. It is less useful, and can be counterproductive, for random-access workloads where prefetching the wrong data wastes RAM bandwidth.

echo 2048 | sudo tee /sys/block/sda/queue/read_ahead_kb

A value such as 2048 KB can be reasonable to test on some systems, especially for sequential workloads, but read-ahead should be tuned empirically because it can also hurt random I/O patterns. It is not a universal baseline for modern SSDs — measure the difference on your actual workload before making it permanent.

To make permanent, add to your udev rules file:

ACTION=="add|change", KERNEL=="sd[a-z]*", ATTR{queue/read_ahead_kb}="2048"

5. Kernel Forensics — Reading the Signals

Do not trust RAM usage graphs. Free RAM does not mean unused RAM. Full RAM does not mean trouble. Read kernel counters directly.

Watch live system statistics updated every second:

vmstat 1

The columns that matter: si (swap in) and so (swap out). If these are constantly increasing, the system is actively swapping. That is a problem.

Read memory counters directly from the kernel:

cat /proc/vmstat | grep -E "pgmajfault|pswpin|pswpout"

If these values are constantly rising during normal use, you have a real problem — not enough RAM, swappiness too high, or too many applications competing for memory simultaneously.

Check memory pressure via PSI:

cat /proc/pressure/memory

This is the difference between debugging by feel and debugging with evidence. Read the signals. Then pull the right lever.


6. Personal Baseline Configuration

The following is not a universal preset. It is a personal baseline for relatively modern desktop systems with SSD storage and at least 16 GB of RAM, and it should be adjusted based on real measurements. Read each section above before applying anything, understand the trade-offs, and adjust to your own situation.

Create the file:

sudo nano /etc/sysctl.d/99-desktop.conf
# Memory
vm.swappiness=10
vm.vfs_cache_pressure=50
vm.dirty_ratio=10
vm.dirty_background_ratio=5
vm.max_map_count=262144
 
# Scheduler
kernel.sched_autogroup_enabled=1

Apply immediately:

sudo sysctl --system

For the bootloader parameters, add to GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub:

transparent_hugepage=madvise zswap.enabled=1 zswap.compressor=lz4

Then regenerate your GRUB configuration (command varies by distribution — see the THP section above) and reboot.

If you are using Zswap, reconsider the swappiness value. Swapping into compressed RAM is much cheaper than hitting disk. A value of 10 may be too conservative — you might want 30 or even higher, letting the kernel use Zswap more freely before reaching the disk.


Conclusion

None of this is magic. These are knobs. They exist because the kernel cannot know your hardware, your workload, your priorities. The distribution sets them once, for the average machine. You are not average. Your machine is not average.

Read the documentation. Cross-reference sources. The Arch Wiki sysctl page and the Gentoo Kernel Configuration Guide are the two best references available. Use them.

Part 2 covers recompiling the kernel from source — removing what you do not need, optimising for your exact CPU, making decisions your distribution would never make for you.

Comments