TGINSIGHT CHAT
Kernel Jungler #1
@k77777777777k
TechnologiesJungling the Linux kernel, squishing bugs away
Recent posts
Page 1 of 3 · 36 posts
Posted 26 days ago
You gave me a u32. I gave you root. (io_uring ZCRX freelist LPE) https://ze3tar.github.io/post-zcrx.html 又来一个 LPE ??? 标题看着很帅,实则 AI slop it is ... 两位大佬直接盖棺定论: Jens Axboe (iouring Maintainer) > If you already have CAP_SYS_ADMIN, what is the point? Solar Designer (Openwall) > That is indeed ridiculous, and puts everything else in this report in (greater) doubt. https://seclists.org/oss-sec/2026/q2/438
Posted 27 days ago
Subject: Dirty Frag: Universal Linux LPE https://www.openwall.com/lists/oss-security/2026/05/07/8 又来???
Posted Apr 22
David H 正式接管 MM 子系统[1],倒计时开启 🎉🎉🎉 第一时间就给他发了祝福,他也坦言挑战不小,但同样很期待接下来这一年。一个旧时代的结束,也意味着一个新时代的开始,感谢 Andrew 这些年的辛苦付出,MM 会越来越好! [1] https://lore.kernel.org/linux-mm/[email protected]/
Posted Apr 20
背景:进程用 mmap(MAP_SHARED) 把一个文件 map 成 shared mapping,到自己的 address space;第一次 touch 某个 mapping 里的一段 virtual address 时,就会 page fault。内核在 fault handler 里会从 page cache 拿到对应的 struct page,再填入 PTE。 faultaround 是 fault 路径上类似 read-ahead 的做法:处理当前 faulting address 时,顺便给相邻 virtual address 预先 map 好(提前装一部分 PTE),让后面的 page fault 更少。 2016 年有个内核改动:commit 5c0a85fad9(mm: make faultaround produce old ptes)让 faultaround 装上的 PTE 以 old / not young 的形态出现;就是 PTE 上的 accessed bit 一开始是 0,后面真正访问时再由硬件把它置位,而之前一开始就会设置为 1,无需硬件再次置位 (不会触发 CPU 内部的 microfault[2]) LKP 在 UnixBench / shell8 上报了大约 6.3% score regression。社区[1] 把主因收敛到:accessed bit 更新在 Intel 上比纸面贵得多。后续在 x86 上,这票改动也被 revert 掉。 也就是这个 revert,害我遭了一波大的,怀疑人生。。。上周帮宋老师测[3],x86 上我这边没看出收益,arm64 上别人却有巨幅提升 。。。 [1] https://lore.kernel.org/lkml/20160606022724.GA26227@yexl-desktop/ [2] https://lore.kernel.org/lkml/CA+55aFy4oYis6HTu7o4YwiFawRtDOPO=87v8oHZdTFS+BjnA8g@mail.gmail.com/ [3] https://lore.kernel.org/linux-mm/[email protected]/
Posted Apr 1
笑死了 下午我[1]和一哥们[2]同时在修同一个 bug,虽然他比我早一点点提交(早几分钟),但错得更离谱。 M 看完后说,还是去 Lance 那边讨论吧,思路更加合理些 。。。 等对线完,这个 fix 顺理成章落到我手里,美滋滋(社区处处都是人情世故。。。) [1] https://lore.kernel.org/linux-mm/[email protected]/ [2] https://lore.kernel.org/all/[email protected]/
Posted Mar 31
https://lore.kernel.org/linux-mm/[email protected]/ 先来一发 RFC,有没有懂 TLB flush 的大佬出来指点下🫡 目前只对 x86 做了优化,其他架构还没看懂 。。。
Posted Mar 26
留个内核需求,等有缘人来发个 RFC: 想给新建的 net namespace 增加一组 TCP 默认参数从 init_net 继承的能力 现在容器场景有个很现实的痛点:宿主机把 TCP 参数调好了,新起的容器默认并不会跟着继承。理论上当然可以让业务自己通过初始化脚本去配置,但这种事情一旦铺开,工作量和沟通成本都很离谱。为了改几个默认参数去推动一圈业务改代码,真的很痛 。。。
Posted Mar 12
感谢开源,爽薅羊毛 :D
Posted Mar 10
[人物] 梓瑶:从开源爱好者到 RISC-V SoC Maintainer 的破界之路
Posted Mar 2
Muchun 给我指了条明路,xdm,想从硬件角度理解乱序的话,一起学,把基础打牢 ~
Posted Feb 26
有一个CPU乱序问题好几天都没有想明白,偷偷私信请教了下 Pual 爷爷。对着回复我现在是第四遍,还是没有消化 。。。 看图会清晰些,请看评论区 ~ > > Sorry to bother you with this question. I'm looking into gup_fast > > (software/lockless page table walker) and wondering about CPU memory > > reordering. > > > > In gup_fast (mm/gup.c), the pattern is: > > > > > > static unsigned long gup_fast(...) > > { > > unsigned long flags; > > > > local_irq_save(flags); // [1] Save flags and CLI > > gup_fast_pgd_range(...); // [2] Read page tables > > pmd = pmdp_get_lockless(pmdp); > > pte = ptep_get_lockless(ptep); > > local_irq_restore(flags); // [3] Restore flags (maybe STI) > > > > return nr_pinned; > > } > > > > > > And my question is: > > > > Can the CPU reorder the page table reads ([2]) to execute: > > 1. Before local_irq_save() ([1])? (reorder up/earlier) > > 2. After local_irq_restore() ([3])? (reorder down/later) > > > > IIUC, local_irq_* is just a compiler barrier, so that could happen. And > > gup_fast design relies on page table modifiers sending IPIs that get > > delayed while interrupts are disabled, that would break ... > > > > Does the function call boundary (gup_fast_pgd_range) provide sufficient > > ordering guarantees? But what if it's inlined. Believe it or not, the answer is "no, but yes". Yes, in theory, the CPU could speculatively execute normal memory accesses in gup_fast_pgd_range() before local_irq_save(). And maybe some real CPUs really do this. However, if an IPI arrives, the CPU must get its house in order so that the IPI is taken at a specific place in the code. The act of getting its house in order forces the CPU to either: 1. Abandon the speculation, so that the IPI appears to happen before the local_irq_save(). 2. Commit the speculation, at which point the local_irq_save() will have executed, forcing the CPU to defer the IPI until after the local_irq_restore(). Either way, after the dust has settled, the IPI will have been taken either before the local_irq_save() or after the local_irq_restore(), but not between the two. Of course, this constrains only from the perspective of the current CPU. Additional memory-ordering constraints would be needed if other CPUs needed to see the above accesses in order. But last I checked, this memory-ordering constraint was correctly handled. "Last I checked" admittedly being a very long time ago, so it would not hurt to double-check for recent kernels. 😉 > > Thanks a lot for your time! No problem, and please let me know how it goes! Thanx, Paul
Posted Feb 24
https://lore.kernel.org/linux-mm/[email protected]/ 先来一发 RFC,有没有懂 TLB flush 的大佬出来指点下🫡 目前只对 x86 做了优化,其他架构还没看懂 。。。