H A D | binfmt_elf_fdpic.c | diff 05f47fda9fc5b17bfab189e9d54228025befc996 Fri Mar 05 15:44:05 CST 2010 Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> coredump: unify dump_seek() implementations for each binfmt_*.c
The current ELF dumper can produce broken corefiles if program headers exceed 65535. In particular, the program in 64-bit environment often demands more than 65535 mmaps. If you google max_map_count, then you can find many users facing this problem.
Solaris has already dealt with this issue, and other OSes have also adopted the same method as in Solaris. Currently, Sun's document and AMD 64 ABI include the description for the extension, where they call the extension Extended Numbering. See Reference for further information.
I believe that linux kernel should adopt the same way as they did, so I've written this patch.
I am also preparing for patches of GDB and binutils.
How to fix ==========
In new dumping process, there are two cases according to weather or not the number of program headers is equal to or more than 65535.
- if less than 65535, the produced corefile format is exactly the same as the ordinary one.
- if equal to or more than 65535, then e_phnum field is set to newly introduced constant PN_XNUM(0xffff) and the actual number of program headers is set to sh_info field of the section header at index 0.
Compatibility Concern =====================
* As already mentioned in Summary, Sun and AMD64 has already adopted this. See Reference.
* There are four combinations according to whether kernel and userland tools are respectively modified or not. The next table summarizes shortly for each combination.
--------------------------------------------- Original Kernel | Modified Kernel --------------------------------------------- < 65535 | >= 65535 | < 65535 | >= 65535 ------------------------------------------------------------- Original Tools | OK | broken | OK | broken (#) ------------------------------------------------------------- Modified Tools | OK | broken | OK | OK -------------------------------------------------------------
Note that there is no case that `OK' changes to `broken'.
(#) Although this case remains broken, O-M behaves better than O-O. That is, while in O-O case e_phnum field would be extremely small due to integer overflow, in O-M case it is guaranteed to be at least 65535 by being set to PN_XNUM(0xFFFF), much closer to the actual correct value than the O-O case.
Test Program ============
Here is a test program mkmmaps.c that is useful to produce the corefile with many mmaps. To use this, please take the following steps:
$ ulimit -c unlimited $ sysctl vm.max_map_count=70000 # default 65530 is too small $ sysctl fs.file-max=70000 $ mkmmaps 65535
Then, the program will abort and a corefile will be generated.
If failed, there are two cases according to the error message displayed.
* ``out of memory'' means vm.max_map_count is still smaller
* ``too many open files'' means fs.file-max is still smaller
So, please change it to a larger value, and then retry it.
mkmmaps.c == #include <stdio.h> #include <stdlib.h> #include <sys/mman.h> #include <fcntl.h> #include <unistd.h> int main(int argc, char **argv) { int maps_num; if (argc < 2) { fprintf(stderr, "mkmmaps [number of maps to be created]\n"); exit(1); } if (sscanf(argv[1], "%d", &maps_num) == EOF) { perror("sscanf"); exit(2); } if (maps_num < 0) { fprintf(stderr, "%d is invalid\n", maps_num); exit(3); } for (; maps_num > 0; --maps_num) { if (MAP_FAILED == mmap((void *)NULL, (size_t) 1, PROT_READ, MAP_SHARED | MAP_ANONYMOUS, (int) -1, (off_t) NULL)) { perror("mmap"); exit(4); } } abort(); { char buffer[128]; sprintf(buffer, "wc -l /proc/%u/maps", getpid()); system(buffer); } return 0; }
Tested on i386, ia64 and um/sys-i386. Built on sh4 (which covers fs/binfmt_elf_fdpic.c)
References ==========
- Sun microsystems: Linker and Libraries. Part No: 817-1984-17, September 2008. URL: http://docs.sun.com/app/docs/doc/817-1984
- System V ABI AMD64 Architecture Processor Supplement Draft Version 0.99., May 11, 2009. URL: http://www.x86-64.org/
This patch:
There are three different definitions for dump_seek() functions in binfmt_aout.c, binfmt_elf.c and binfmt_elf_fdpic.c, respectively. The only for binfmt_elf.c.
My next patch will move dump_seek() into a header file in order to share the same implementations for dump_write() and dump_seek(). As the first step, this patch unify these three definitions for dump_seek() by applying the past commits that have been applied only for binfmt_elf.c.
Specifically, the modification made here is part of the following commits:
* d025c9db7f31fc0554ce7fb2dfc78d35a77f3487 * 7f14daa19ea36b200d237ad3ac5826ae25360461
This patch does not change a shape of corefiles.
Signed-off-by: Daisuke HATAYAMA <d.hatayama@jp.fujitsu.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: David Howells <dhowells@redhat.com> Cc: Greg Ungerer <gerg@snapgear.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andi Kleen <andi@firstfloor.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
H A D | binfmt_elf.c | diff d025c9db7f31fc0554ce7fb2dfc78d35a77f3487 Sun Oct 01 01:29:28 CDT 2006 Andi Kleen <ak@suse.de> [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern
Using the infrastructure created in previous patches implement support to pipe core dumps into programs.
This is done by overloading the existing core_pattern sysctl with a new syntax:
|program
When the first character of the pattern is a '|' the kernel will instead threat the rest of the pattern as a command to run. The core dump will be written to the standard input of that program instead of to a file.
This is useful for having automatic core dump analysis without filling up disks. The program can do some simple analysis and save only a summary of the core dump.
The core dump proces will run with the privileges and in the name space of the process that caused the core dump.
I also increased the core pattern size to 128 bytes so that longer command lines fit.
Most of the changes comes from allowing core dumps without seeks. They are fairly straight forward though.
One small incompatibility is that if someone had a core pattern previously that started with '|' they will get suddenly new behaviour. I think that's unlikely to be a real problem though.
Additional background:
> Very nice, do you happen to have a program that can accept this kind of > input for crash dumps? I'm guessing that the embedded people will > really want this functionality.
I had a cheesy demo/prototype. Basically it wrote the dump to a file again, ran gdb on it to get a backtrace and wrote the summary to a shared directory. Then there was a simple CGI script to generate a "top 10" crashes HTML listing.
Unfortunately this still had the disadvantage to needing full disk space for a dump except for deleting it afterwards (in fact it was worse because over the pipe holes didn't work so if you have a holey address map it would require more space).
Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as cores (at least it worked with zsh's =(cat core) syntax), so it would be likely possible to do it without temporary space with a simple wrapper that calls it in the right way. I ran out of time before doing that though.
The demo prototype scripts weren't very good. If there is really interest I can dig them out (they are currently on a laptop disk on the desk with the laptop itself being in service), but I would recommend to rewrite them for any serious application of this and fix the disk space problem.
Also to be really useful it should probably find a way to automatically fetch the debuginfos (I cheated and just installed them in advance). If nobody else does it I can probably do the rewrite myself again at some point.
My hope at some point was that desktops would support it in their builtin crash reporters, but at least the KDE people I talked too seemed to be happy with their user space only solution.
Alan sayeth:
I don't believe that piping as such as neccessarily the right model, but the ability to intercept and processes core dumps from user space is asked for by many enterprise users as well. They want to know about, capture, analyse and process core dumps, often centrally and in automated form.
[akpm@osdl.org: loff_t != unsigned long] Signed-off-by: Andi Kleen <ak@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
H A D | exec.c | diff d025c9db7f31fc0554ce7fb2dfc78d35a77f3487 Sun Oct 01 01:29:28 CDT 2006 Andi Kleen <ak@suse.de> [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern
Using the infrastructure created in previous patches implement support to pipe core dumps into programs.
This is done by overloading the existing core_pattern sysctl with a new syntax:
|program
When the first character of the pattern is a '|' the kernel will instead threat the rest of the pattern as a command to run. The core dump will be written to the standard input of that program instead of to a file.
This is useful for having automatic core dump analysis without filling up disks. The program can do some simple analysis and save only a summary of the core dump.
The core dump proces will run with the privileges and in the name space of the process that caused the core dump.
I also increased the core pattern size to 128 bytes so that longer command lines fit.
Most of the changes comes from allowing core dumps without seeks. They are fairly straight forward though.
One small incompatibility is that if someone had a core pattern previously that started with '|' they will get suddenly new behaviour. I think that's unlikely to be a real problem though.
Additional background:
> Very nice, do you happen to have a program that can accept this kind of > input for crash dumps? I'm guessing that the embedded people will > really want this functionality.
I had a cheesy demo/prototype. Basically it wrote the dump to a file again, ran gdb on it to get a backtrace and wrote the summary to a shared directory. Then there was a simple CGI script to generate a "top 10" crashes HTML listing.
Unfortunately this still had the disadvantage to needing full disk space for a dump except for deleting it afterwards (in fact it was worse because over the pipe holes didn't work so if you have a holey address map it would require more space).
Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as cores (at least it worked with zsh's =(cat core) syntax), so it would be likely possible to do it without temporary space with a simple wrapper that calls it in the right way. I ran out of time before doing that though.
The demo prototype scripts weren't very good. If there is really interest I can dig them out (they are currently on a laptop disk on the desk with the laptop itself being in service), but I would recommend to rewrite them for any serious application of this and fix the disk space problem.
Also to be really useful it should probably find a way to automatically fetch the debuginfos (I cheated and just installed them in advance). If nobody else does it I can probably do the rewrite myself again at some point.
My hope at some point was that desktops would support it in their builtin crash reporters, but at least the KDE people I talked too seemed to be happy with their user space only solution.
Alan sayeth:
I don't believe that piping as such as neccessarily the right model, but the ability to intercept and processes core dumps from user space is asked for by many enterprise users as well. They want to know about, capture, analyse and process core dumps, often centrally and in automated form.
[akpm@osdl.org: loff_t != unsigned long] Signed-off-by: Andi Kleen <ak@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
H A D | sysctl.c | diff d025c9db7f31fc0554ce7fb2dfc78d35a77f3487 Sun Oct 01 01:29:28 CDT 2006 Andi Kleen <ak@suse.de> [PATCH] Support piping into commands in /proc/sys/kernel/core_pattern
Using the infrastructure created in previous patches implement support to pipe core dumps into programs.
This is done by overloading the existing core_pattern sysctl with a new syntax:
|program
When the first character of the pattern is a '|' the kernel will instead threat the rest of the pattern as a command to run. The core dump will be written to the standard input of that program instead of to a file.
This is useful for having automatic core dump analysis without filling up disks. The program can do some simple analysis and save only a summary of the core dump.
The core dump proces will run with the privileges and in the name space of the process that caused the core dump.
I also increased the core pattern size to 128 bytes so that longer command lines fit.
Most of the changes comes from allowing core dumps without seeks. They are fairly straight forward though.
One small incompatibility is that if someone had a core pattern previously that started with '|' they will get suddenly new behaviour. I think that's unlikely to be a real problem though.
Additional background:
> Very nice, do you happen to have a program that can accept this kind of > input for crash dumps? I'm guessing that the embedded people will > really want this functionality.
I had a cheesy demo/prototype. Basically it wrote the dump to a file again, ran gdb on it to get a backtrace and wrote the summary to a shared directory. Then there was a simple CGI script to generate a "top 10" crashes HTML listing.
Unfortunately this still had the disadvantage to needing full disk space for a dump except for deleting it afterwards (in fact it was worse because over the pipe holes didn't work so if you have a holey address map it would require more space).
Fortunately gdb seems to be happy to handle /proc/pid/fd/xxx input pipes as cores (at least it worked with zsh's =(cat core) syntax), so it would be likely possible to do it without temporary space with a simple wrapper that calls it in the right way. I ran out of time before doing that though.
The demo prototype scripts weren't very good. If there is really interest I can dig them out (they are currently on a laptop disk on the desk with the laptop itself being in service), but I would recommend to rewrite them for any serious application of this and fix the disk space problem.
Also to be really useful it should probably find a way to automatically fetch the debuginfos (I cheated and just installed them in advance). If nobody else does it I can probably do the rewrite myself again at some point.
My hope at some point was that desktops would support it in their builtin crash reporters, but at least the KDE people I talked too seemed to be happy with their user space only solution.
Alan sayeth:
I don't believe that piping as such as neccessarily the right model, but the ability to intercept and processes core dumps from user space is asked for by many enterprise users as well. They want to know about, capture, analyse and process core dumps, often centrally and in automated form.
[akpm@osdl.org: loff_t != unsigned long] Signed-off-by: Andi Kleen <ak@suse.de> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|