While studying kernel exploitation, I came across exploits that corrupt kernel pipe structures. I had previously used pipe(2) but have never studied the implementation (well, I haven’t actually studied any of the kernel code until recently).
One thing that made me curious is why does the implementation use a ring of pages to store the pipe data? Why not dynamically allocate a chunk (e.g. kmalloc) for the user data (especially when the user is not using the entire allocated page)?
struct pipe_inode_info {
...
unsigned int head;
unsigned int tail;
...
struct pipe_buffer *bufs;
...
};
struct pipe_buffer {
...
struct page *page;
...
};
I didn’t look for an answer because I was busy trying to solve a CTF challenge. A few days later, I learned about the splice syscall which allows efficient transfer of data in the kernel (zero-copy) using pipes. By the time I finished learning about splice I had accidentally answered my question from a few days ago.
Because a pipe buffer is a page, it can be easily shared and mapped. More importantly, pages from the file cache can be spliced into the ring.
But of course I needed confirmation - maybe this is a happy coincidence and the pipe’s design has nothing to do with splice ¯\(ヅ)/¯
I spent some time checking the kernel archives and sure enough I found my answer: pipe(2) is intentionally designed to use a list of pages to support splice.
Jan 2005
Make pipe data structure be a circular list of pages...
This improves pipe throughput, and allows us to (eventually) use these lists of page buffers for moving data around efficiently.
.. one year later ..
March 2006
references: