Think what would be result of hundreds of clients accessing door concurrently in implementation where each invocation is served in separate thread/process taken from the pool. For each client there will be two processes: one is thread serving invocation and another is client itself sleeping until invocation finishes. In particular entering and leaving a door require process switch.
This overhead (which can be bottleneck on the highly loaded system) can be avoided by serving door invocation by the client itself. More specifically: Linux process (or equivalently task) is represented in kernel by an instance of task_struct structure described in /usr/include/linux/sched.h.
Process can be thought of as thread (schedulable entity), address space and other attributes. To serve door invocation kernel replaces address space related fields of caller's task_struct with ones of door server. From this moment on client shares address space with server. Old values of replaced fields are stored in the kernel. When server finishes door invocation, old values are restored and client returns into its original address space. Fields replaced are:
task_struct.sigpending
task_struct.addr_limit
task_struct.exec_domain
task_struct.mm
task_struct.active_mm
task_struct.binfmt
task_struct.personality
task_struct.euid /* only when door is suid */
task_struct.egid /* only when door is sgid */
task_struct.sig
task_struct.blocked
task_struct.pending
task_struct.sas_ss_sp
task_struct.sas_ss_size
task_struct.notifier
task_struct.notifier_data
task_struct.notifier_mask
This poses a problem: if all clients run in the same address space how can kernel decide where to store arguments passed? In traditional implementation each invocation is served in separate thread with distinct stack. Task of managing invocation stacks is better to be done at user-level than in kernel. LDoors communicates this information from user-level by means on simple "boostrap-protocol".
When server process first sets door up, it passes to kernel description of door arguments, address of entry point function, exit notification signal and a number of "boostrap stacks". Each boostrap stack is just area in virtual memory of server process. When client invokes door it is switches into server address space as described above. Arguments passed by client as part of invocation are stored in kernel space. Then, while client process is still in kernel, return address that was stored on the kernel stack when client made invocation sys-call is replaced with address of entry function (address in server address space) and old return address is saved. Then kernel looks for free boostrap stack, stores appropriate stack pointer as stack pointer that will be used on return from the sys-call, again saves old stack pointer and marks bootstrap stack as used.
Now client process ultimately returns from the invocation sys-call (which actually is ioctl()). But because of task_struct and registers manipulations just described it'll find itself running (in user mode) in server address space and executing code of door entry function using one of boostrap stacks. Door entry function should do the follows:
allocate working stack where door invocation will be served. This can be done in a way user mode choose: malloc(), mmap(), or preallocated pool of stacks;
switch to working stack (for example by longjmp());
select some specific "cookie value";
allocate a space for invocation arguments;
make "arguments passing" call, giving to it cookie and pointer to allocated argument space.
On "arguments passing" call kernel marks appropriate boostrap stack as free, stores cookie value, copies previously stored invocation arguments into user space and returns.
Now, client process is running in server address space on valid stack with arguments passed. Uh! It's time to do real work. When door invocation is about to exit it makes "door exit" call. On this call kernel switches caller back into saved client address space, restores program counter and stack pointer and, before exiting sends to server process "exit notification" signal (if configured during door setup), passing cookie in siginfo_t. This information can be used by server process to do cleanup, reclaim working stack etc.
All this complexity is not for waste. It achieves important goals.
No new thread is created to serve door invocation. Each thread (== process in Linux) occupies at least 8K of kernel space (task_struct + kernel stack), lurks in hash-tables, queues etc.
And therefore door invocation doesn't require process switch (going through switch()), only context switch (replacing user-registers, address space fields in task_struct, flushing TLB, i386's [GL]DT etc.). This can be termed in-place direct hand-off scheduling, because execution continue in context known in advance ("hand-off") without switch() ("direct") and without even creation on new thread ("in-place").