#
98006636 |
| 06-Jul-2015 |
Mauro Carvalho Chehab <mchehab@osg.samsung.com> |
Merge tag 'v4.2-rc1' into patchwork
Linux 4.2-rc1
* tag 'v4.2-rc1': (12415 commits) Linux 4.2-rc1 bluetooth: fix list handling 9p: cope with bogus responses from server in p9_client_{read,wri
Merge tag 'v4.2-rc1' into patchwork
Linux 4.2-rc1
* tag 'v4.2-rc1': (12415 commits) Linux 4.2-rc1 bluetooth: fix list handling 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation NTB: Add split BAR output for debugfs stats NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe NTB: Print driver name and version in module init NTB: Increase transport MTU to 64k from 16k NTB: Rename Intel code names to platform names NTB: Default to CPU memcpy for performance NTB: Improve performance with write combining NTB: Use NUMA memory in Intel driver NTB: Use NUMA memory and DMA chan in transport NTB: Rate limit ntb_qp_link_work NTB: Add tool test client ...
show more ...
|
Revision tags: v4.2-rc1 |
|
#
e0456717 |
| 24-Jun-2015 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
1) Add TX fast path in mac80211, from Johannes Berg.
2) Add TSO/GRO support to ibmve
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
1) Add TX fast path in mac80211, from Johannes Berg.
2) Add TSO/GRO support to ibmveth, from Thomas Falcon
3) Move away from cached routes in ipv6, just like ipv4, from Martin KaFai Lau.
4) Lots of new rhashtable tests, from Thomas Graf.
5) Run ingress qdisc lockless, from Alexei Starovoitov.
6) Allow servers to fetch TCP packet headers for SYN packets of new connections, for fingerprinting. From Eric Dumazet.
7) Add mode parameter to pktgen, for testing receive. From Alexei Starovoitov.
8) Cache access optimizations via simplifications of build_skb(), from Alexander Duyck.
9) Move page frag allocator under mm/, also from Alexander.
10) Add xmit_more support to hv_netvsc, from KY Srinivasan.
11) Add a counter guard in case we try to perform endless reclassify loops in the packet scheduler.
12) Extern flow dissector to be programmable and use it in new "Flower" classifier. From Jiri Pirko.
13) AF_PACKET fanout rollover fixes, performance improvements, and new statistics. From Willem de Bruijn.
14) Add netdev driver for GENEVE tunnels, from John W Linville.
15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.
16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.
17) Add an ECN retry fallback for the initial TCP handshake, from Daniel Borkmann.
18) Add tail call support to BPF, from Alexei Starovoitov.
19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.
20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.
21) Favor even port numbers for allocation to connect() requests, and odd port numbers for bind(0), in an effort to help avoid ip_local_port_range exhaustion. From Eric Dumazet.
22) Add Cavium ThunderX driver, from Sunil Goutham.
23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata, from Alexei Starovoitov.
24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.
25) Double TCP Small Queues default to 256K to accomodate situations like the XEN driver and wireless aggregation. From Wei Liu.
26) Add more entropy inputs to flow dissector, from Tom Herbert.
27) Add CDG congestion control algorithm to TCP, from Kenneth Klette Jonassen.
28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.
29) Track and act upon link status of ipv4 route nexthops, from Andy Gospodarek.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits) bridge: vlan: flush the dynamically learned entries on port vlan delete bridge: multicast: add a comment to br_port_state_selection about blocking state net: inet_diag: export IPV6_V6ONLY sockopt stmmac: troubleshoot unexpected bits in des0 & des1 net: ipv4 sysctl option to ignore routes when nexthop link is down net: track link-status of ipv4 nexthops net: switchdev: ignore unsupported bridge flags net: Cavium: Fix MAC address setting in shutdown state drivers: net: xgene: fix for ACPI support without ACPI ip: report the original address of ICMP messages net/mlx5e: Prefetch skb data on RX net/mlx5e: Pop cq outside mlx5e_get_cqe net/mlx5e: Remove mlx5e_cq.sqrq back-pointer net/mlx5e: Remove extra spaces net/mlx5e: Avoid TX CQE generation if more xmit packets expected net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq() net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device ...
show more ...
|
#
ec3b34e1 |
| 22-Jun-2015 |
Jiri Kosina <jkosina@suse.cz> |
Merge branches 'for-4.2/i2c-hid', 'for-4.2/lenovo', 'for-4.2/plantronics', 'for-4.2/rmi', 'for-4.2/sensor-hub', 'for-4.2/sjoy', 'for-4.2/sony' and 'for-4.2/wacom' into for-linus
Conflicts: drivers/
Merge branches 'for-4.2/i2c-hid', 'for-4.2/lenovo', 'for-4.2/plantronics', 'for-4.2/rmi', 'for-4.2/sensor-hub', 'for-4.2/sjoy', 'for-4.2/sony' and 'for-4.2/wacom' into for-linus
Conflicts: drivers/hid/wacom_wac.c
show more ...
|
Revision tags: v4.1 |
|
#
9f42c8b3 |
| 15-Jun-2015 |
David S. Miller <davem@davemloft.net> |
Merge branch 'bpf-share-helpers'
Alexei Starovoitov says:
==================== v1->v2: switched to init_user_ns from current_user_ns as suggested by Andy
Introduce new helpers to access 'struct ta
Merge branch 'bpf-share-helpers'
Alexei Starovoitov says:
==================== v1->v2: switched to init_user_ns from current_user_ns as suggested by Andy
Introduce new helpers to access 'struct task_struct'->pid, tgid, uid, gid, comm fields in tracing and networking.
Share bpf_trace_printk() and bpf_get_smp_processor_id() helpers between tracing and networking. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.1-rc8 |
|
#
ab1973d3 |
| 12-Jun-2015 |
Alexei Starovoitov <ast@plumgrid.com> |
bpf: let kprobe programs use bpf_get_smp_processor_id() helper
It's useful to do per-cpu histograms.
Suggested-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Alexei Starovoitov <ast@
bpf: let kprobe programs use bpf_get_smp_processor_id() helper
It's useful to do per-cpu histograms.
Suggested-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
0756ea3e |
| 12-Jun-2015 |
Alexei Starovoitov <ast@plumgrid.com> |
bpf: allow networking programs to use bpf_trace_printk() for debugging
bpf_trace_printk() is a helper function used to debug eBPF programs. Let socket and TC programs use it as well. Note, it's DEBU
bpf: allow networking programs to use bpf_trace_printk() for debugging
bpf_trace_printk() is a helper function used to debug eBPF programs. Let socket and TC programs use it as well. Note, it's DEBUG ONLY helper. If it's used in the program, the kernel will print warning banner to make sure users don't use it in production.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
ffeedafb |
| 12-Jun-2015 |
Alexei Starovoitov <ast@plumgrid.com> |
bpf: introduce current->pid, tgid, uid, gid, comm accessors
eBPF programs attached to kprobes need to filter based on current->pid, uid and other fields, so introduce helper functions:
u64 bpf_get_
bpf: introduce current->pid, tgid, uid, gid, comm accessors
eBPF programs attached to kprobes need to filter based on current->pid, uid and other fields, so introduce helper functions:
u64 bpf_get_current_pid_tgid(void) Return: current->tgid << 32 | current->pid
u64 bpf_get_current_uid_gid(void) Return: current_gid << 32 | current_uid
bpf_get_current_comm(char *buf, int size_of_buf) stores current->comm into buf
They can be used from the programs attached to TC as well to classify packets based on current task fields.
Update tracex2 example to print histogram of write syscalls for each process instead of aggregated for all.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
206c59d1 |
| 10-Jun-2015 |
Johannes Berg <johannes.berg@intel.com> |
Merge remote-tracking branch 'net-next/master' into mac80211-next
Merge back net-next to get wireless driver changes (from Kalle) to be able to create the API change across all trees properly.
Sign
Merge remote-tracking branch 'net-next/master' into mac80211-next
Merge back net-next to get wireless driver changes (from Kalle) to be able to create the API change across all trees properly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
show more ...
|
#
6724af48 |
| 09-Jun-2015 |
Mark Brown <broonie@kernel.org> |
Merge branch 'fix/fsl-dspi' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi into spi-fsl-dspi
|
Revision tags: v4.1-rc7, v4.1-rc6 |
|
#
17ca8cbf |
| 29-May-2015 |
Daniel Borkmann <daniel@iogearbox.net> |
ebpf: allow bpf_ktime_get_ns_proto also for networking
As this is already exported from tracing side via commit d9847d310ab4 ("tracing: Allow BPF programs to call bpf_ktime_get_ns()"), we might as w
ebpf: allow bpf_ktime_get_ns_proto also for networking
As this is already exported from tracing side via commit d9847d310ab4 ("tracing: Allow BPF programs to call bpf_ktime_get_ns()"), we might as well want to move it to the core, so also networking users can make use of it, e.g. to measure diffs for certain flows from ingress/egress.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.1-rc5 |
|
#
3f55b7ed |
| 21-May-2015 |
David S. Miller <davem@davemloft.net> |
Merge branch 'ebpf-tail-call'
Alexei Starovoitov says:
==================== bpf: introduce bpf_tail_call() helper
introduce bpf_tail_call(ctx, &jmp_table, index) helper function which can be used
Merge branch 'ebpf-tail-call'
Alexei Starovoitov says:
==================== bpf: introduce bpf_tail_call() helper
introduce bpf_tail_call(ctx, &jmp_table, index) helper function which can be used from BPF programs like: int bpf_prog(struct pt_regs *ctx) { ... bpf_tail_call(ctx, &jmp_table, index); ... } that is roughly equivalent to: int bpf_prog(struct pt_regs *ctx) { ... if (jmp_table[index]) return (*jmp_table[index])(ctx); ... } The important detail that it's not a normal call, but a tail call. The kernel stack is precious, so this helper reuses the current stack frame and jumps into another BPF program without adding extra call frame. It's trivially done in interpreter and a bit trickier in JITs.
Use cases: - simplify complex programs - dispatch into other programs (for example: index in jump table can be syscall number or network protocol) - build dynamic chains of programs
The chain of tail calls can form unpredictable dynamic loops therefore tail_call_cnt is used to limit the number of calls and currently is set to 32.
patch 1 - support bpf_tail_call() in interpreter patch 2 - support in x64 JIT We've discussed what's neccessary to support it in arm64/s390 JITs and it looks fine. patch 3 - sample example for tracing patch 4 - sample example for networking
More details in every patch.
This set went through several iterations of reviews/fixes and older attempts can be seen: https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/log/?h=tail_call_v[123456] - tail_call_v1 does it without touching JITs but introduces overhead for all programs that don't use this helper function. - tail_call_v2 still has some overhead and x64 JIT does full stack unwind (prologue skipping optimization wasn't there) - tail_call_v3 reuses 'call' instruction encoding and has interpreter overhead for every normal call - tail_call_v4 fixes above architectural shortcomings and v5,v6 fix few more bugs
This last tail_call_v6 approach seems to be the best. ====================
Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
#
04fd61ab |
| 19-May-2015 |
Alexei Starovoitov <ast@plumgrid.com> |
bpf: allow bpf programs to tail-call other bpf programs
introduce bpf_tail_call(ctx, &jmp_table, index) helper function which can be used from BPF programs like: int bpf_prog(struct pt_regs *ctx) {
bpf: allow bpf programs to tail-call other bpf programs
introduce bpf_tail_call(ctx, &jmp_table, index) helper function which can be used from BPF programs like: int bpf_prog(struct pt_regs *ctx) { ... bpf_tail_call(ctx, &jmp_table, index); ... } that is roughly equivalent to: int bpf_prog(struct pt_regs *ctx) { ... if (jmp_table[index]) return (*jmp_table[index])(ctx); ... } The important detail that it's not a normal call, but a tail call. The kernel stack is precious, so this helper reuses the current stack frame and jumps into another BPF program without adding extra call frame. It's trivially done in interpreter and a bit trickier in JITs. In case of x64 JIT the bigger part of generated assembler prologue is common for all programs, so it is simply skipped while jumping. Other JITs can do similar prologue-skipping optimization or do stack unwind before jumping into the next program.
bpf_tail_call() arguments: ctx - context pointer jmp_table - one of BPF_MAP_TYPE_PROG_ARRAY maps used as the jump table index - index in the jump table
Since all BPF programs are idenitified by file descriptor, user space need to populate the jmp_table with FDs of other BPF programs. If jmp_table[index] is empty the bpf_tail_call() doesn't jump anywhere and program execution continues as normal.
New BPF_MAP_TYPE_PROG_ARRAY map type is introduced so that user space can populate this jmp_table array with FDs of other bpf programs. Programs can share the same jmp_table array or use multiple jmp_tables.
The chain of tail calls can form unpredictable dynamic loops therefore tail_call_cnt is used to limit the number of calls and currently is set to 32.
Use cases: Acked-by: Daniel Borkmann <daniel@iogearbox.net>
========== - simplify complex programs by splitting them into a sequence of small programs
- dispatch routine For tracing and future seccomp the program may be triggered on all system calls, but processing of syscall arguments will be different. It's more efficient to implement them as: int syscall_entry(struct seccomp_data *ctx) { bpf_tail_call(ctx, &syscall_jmp_table, ctx->nr /* syscall number */); ... default: process unknown syscall ... } int sys_write_event(struct seccomp_data *ctx) {...} int sys_read_event(struct seccomp_data *ctx) {...} syscall_jmp_table[__NR_write] = sys_write_event; syscall_jmp_table[__NR_read] = sys_read_event;
For networking the program may call into different parsers depending on packet format, like: int packet_parser(struct __sk_buff *skb) { ... parse L2, L3 here ... __u8 ipproto = load_byte(skb, ... offsetof(struct iphdr, protocol)); bpf_tail_call(skb, &ipproto_jmp_table, ipproto); ... default: process unknown protocol ... } int parse_tcp(struct __sk_buff *skb) {...} int parse_udp(struct __sk_buff *skb) {...} ipproto_jmp_table[IPPROTO_TCP] = parse_tcp; ipproto_jmp_table[IPPROTO_UDP] = parse_udp;
- for TC use case, bpf_tail_call() allows to implement reclassify-like logic
- bpf_map_update_elem/delete calls into BPF_MAP_TYPE_PROG_ARRAY jump table are atomic, so user space can build chains of BPF programs on the fly
Implementation details: ======================= - high performance of bpf_tail_call() is the goal. It could have been implemented without JIT changes as a wrapper on top of BPF_PROG_RUN() macro, but with two downsides: . all programs would have to pay performance penalty for this feature and tail call itself would be slower, since mandatory stack unwind, return, stack allocate would be done for every tailcall. . tailcall would be limited to programs running preempt_disabled, since generic 'void *ctx' doesn't have room for 'tail_call_cnt' and it would need to be either global per_cpu variable accessed by helper and by wrapper or global variable protected by locks.
In this implementation x64 JIT bypasses stack unwind and jumps into the callee program after prologue.
- bpf_prog_array_compatible() ensures that prog_type of callee and caller are the same and JITed/non-JITed flag is the same, since calling JITed program from non-JITed is invalid, since stack frames are different. Similarly calling kprobe type program from socket type program is invalid.
- jump table is implemented as BPF_MAP_TYPE_PROG_ARRAY to reuse 'map' abstraction, its user space API and all of verifier logic. It's in the existing arraymap.c file, since several functions are shared with regular array map.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
show more ...
|
Revision tags: v4.1-rc4, v4.1-rc3 |
|
#
7ae383be |
| 08-May-2015 |
Ingo Molnar <mingo@kernel.org> |
Merge branch 'linus' into x86/asm, before applying dependent patch
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
Revision tags: v4.1-rc2 |
|
#
b3e5ced6 |
| 27-Apr-2015 |
Mauro Carvalho Chehab <mchehab@osg.samsung.com> |
Merge tag 'v4.1-rc1' into patchwork
Linux 4.1-rc1
* tag 'v4.1-rc1': (11651 commits) Linux 4.1-rc1 x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue v4l: xilinx: fix for includ
Merge tag 'v4.1-rc1' into patchwork
Linux 4.1-rc1
* tag 'v4.1-rc1': (11651 commits) Linux 4.1-rc1 x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue v4l: xilinx: fix for include file movement platform/chrome: chromeos_laptop - instantiate Atmel at primary address RCU pathwalk breakage when running into a symlink overmounting something fix I_DIO_WAKEUP definition direct-io: only inc/dec inode->i_dio_count for file systems fs/9p: fix readdir() Btrfs: prevent list corruption during free space cache processing toshiba_acpi: Do not register vendor backlight when acpi_video bl is available x86: fix special __probe_kernel_write() tail zeroing case crypto: img-hash - CRYPTO_DEV_IMGTEC_HASH should depend on HAS_DMA crypto: x86/sha512_ssse3 - fixup for asm function prototype change nios2: rework cache nios2: Add types.h header required for __u32 type ALSA: hda - fix headset mic detection problem for one more machine eth: bf609 eth clock: add pclk clock for stmmac driver probe blackfin: Wire up missing syscalls Btrfs: fix inode cache writeout ACPI / scan: Add a scan handler for PRP0001 ...
show more ...
|
Revision tags: v4.1-rc1 |
|
#
64131a87 |
| 21-Apr-2015 |
Mauro Carvalho Chehab <mchehab@osg.samsung.com> |
Merge branch 'drm-next-merged' of git://people.freedesktop.org/~airlied/linux into v4l_for_linus
* 'drm-next-merged' of git://people.freedesktop.org/~airlied/linux: (9717 commits) media-bus: Fixup
Merge branch 'drm-next-merged' of git://people.freedesktop.org/~airlied/linux into v4l_for_linus
* 'drm-next-merged' of git://people.freedesktop.org/~airlied/linux: (9717 commits) media-bus: Fixup RGB444_1X12, RGB565_1X16, and YUV8_1X24 media bus format hexdump: avoid warning in test function fs: take i_mutex during prepare_binprm for set[ug]id executables smp: Fix error case handling in smp_call_function_*() iommu-common: Fix PARISC compile-time warnings sparc: Make LDC use common iommu poll management functions sparc: Make sparc64 use scalable lib/iommu-common.c functions Break up monolithic iommu table/lock into finer graularity pools and lock sparc: Revert generic IOMMU allocator. tools/power turbostat: correct dumped pkg-cstate-limit value tools/power turbostat: calculate TSC frequency from CPUID(0x15) on SKL tools/power turbostat: correct DRAM RAPL units on recent Xeon processors tools/power turbostat: Initial Skylake support tools/power turbostat: Use $(CURDIR) instead of $(PWD) and add support for O= option in Makefile tools/power turbostat: modprobe msr, if needed tools/power turbostat: dump MSR_TURBO_RATIO_LIMIT2 tools/power turbostat: use new MSR_TURBO_RATIO_LIMIT names Bluetooth: hidp: Fix regression with older userspace and flags validation config: Enable NEED_DMA_MAP_STATE by default when SWIOTLB is selected perf/x86/intel/pt: Fix and clean up error handling in pt_event_add() ...
That solves several merge conflicts: Documentation/DocBook/media/v4l/subdev-formats.xml Documentation/devicetree/bindings/vendor-prefixes.txt drivers/staging/media/mn88473/mn88473.c include/linux/kconfig.h include/uapi/linux/media-bus-format.h
The ones at subdev-formats.xml and media-bus-format.h are not trivial. That's why we opted to merge from DRM.
show more ...
|
#
2c33ce00 |
| 19-Apr-2015 |
Dave Airlie <airlied@redhat.com> |
Merge Linus master into drm-next
The merge is clean, but the arm build fails afterwards, due to API changes in the regulator tree.
I've included the patch into the merge to fix the build.
Signed-o
Merge Linus master into drm-next
The merge is clean, but the arm build fails afterwards, due to API changes in the regulator tree.
I've included the patch into the merge to fix the build.
Signed-off-by: Dave Airlie <airlied@redhat.com>
show more ...
|
#
6c8a53c9 |
| 14-Apr-2015 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar: "Core kernel changes:
- One of the more interesting features in t
Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf changes from Ingo Molnar: "Core kernel changes:
- One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes.
This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.)
(Alexei Starovoitov)
- Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf.
This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks:
- cluster wide profiling
- for system wide tracing with user-space events,
- JIT profiling events
etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al.
(Peter Zijlstra)
Hardware enablement kernel changes:
- x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs.
The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous.
This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2.
(Alexander Shishkin, Peter Zijlstra)
- x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads.
These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.)
(Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr)
- x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option:
perf record --call-graph lbr perf report
or:
perf top --call-graph lbr
This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations:
- It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time.
- It is only available for user-space callchains.
(Yan, Zheng)
- x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models.
(Andi Kleen)
- x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent.
(Maria Dimakopoulou, Stephane Eranian)
The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above:
User visible changes affecting all tools:
- Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo)
User visible changes in individual tools:
'perf data':
New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior)
'perf diff':
Add --kallsyms option (David Ahern)
'perf list':
Allow listing events with 'tracepoint' prefix (Yunlong Song)
Sort the output of the command (Yunlong Song)
'perf kmem':
Respect -i option (Jiri Olsa)
Print big numbers using thousands' group (Namhyung Kim)
Allow -v option (Namhyung Kim)
Fix alignment of slab result table (Namhyung Kim)
'perf probe':
Support multiple probes on different binaries on the same command line (Masami Hiramatsu)
Support unnamed union/structure members data collection. (Masami Hiramatsu)
Check kprobes blacklist when adding new events. (Masami Hiramatsu)
'perf record':
Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)
Support recording running/enabled time (Andi Kleen)
'perf sched':
Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song)
'perf report' and 'perf top':
Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo)
Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo)
Add pid/tid filtering to 'report' and 'script' commands (David Ahern)
Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo)
'perf stat':
Report unsupported events properly (Suzuki K. Poulose)
Output running time and run/enabled ratio in CSV mode (Andi Kleen)
'perf trace':
Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo)
Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo)
Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo)
Dump stack on segfaults (Arnaldo Carvalho de Melo)
No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo)
Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo)
There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...
show more ...
|
Revision tags: v4.0, v4.0-rc7, v4.0-rc6 |
|
#
9c959c86 |
| 25-Mar-2015 |
Alexei Starovoitov <ast@plumgrid.com> |
tracing: Allow BPF programs to call bpf_trace_printk()
Debugging of BPF programs needs some form of printk from the program, so let programs call limited trace_printk() with %d %u %x %p modifiers on
tracing: Allow BPF programs to call bpf_trace_printk()
Debugging of BPF programs needs some form of printk from the program, so let programs call limited trace_printk() with %d %u %x %p modifiers only.
Similar to kernel modules, during program load verifier checks whether program is calling bpf_trace_printk() and if so, kernel allocates trace_printk buffers and emits big 'this is debug only' banner.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1427312966-8434-6-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
d9847d31 |
| 25-Mar-2015 |
Alexei Starovoitov <ast@plumgrid.com> |
tracing: Allow BPF programs to call bpf_ktime_get_ns()
bpf_ktime_get_ns() is used by programs to compute time delta between events or as a timestamp
Signed-off-by: Alexei Starovoitov <ast@plumgrid.
tracing: Allow BPF programs to call bpf_ktime_get_ns()
bpf_ktime_get_ns() is used by programs to compute time delta between events or as a timestamp
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1427312966-8434-5-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|
#
2541517c |
| 25-Mar-2015 |
Alexei Starovoitov <ast@plumgrid.com> |
tracing, perf: Implement BPF programs attached to kprobes
BPF programs, attached to kprobes, provide a safe way to execute user-defined BPF byte-code programs without being able to crash or hang the
tracing, perf: Implement BPF programs attached to kprobes
BPF programs, attached to kprobes, provide a safe way to execute user-defined BPF byte-code programs without being able to crash or hang the kernel in any way. The BPF engine makes sure that such programs have a finite execution time and that they cannot break out of their sandbox.
The user interface is to attach to a kprobe via the perf syscall:
struct perf_event_attr attr = { .type = PERF_TYPE_TRACEPOINT, .config = event_id, ... };
event_fd = perf_event_open(&attr,...); ioctl(event_fd, PERF_EVENT_IOC_SET_BPF, prog_fd);
'prog_fd' is a file descriptor associated with BPF program previously loaded.
'event_id' is an ID of the kprobe created.
Closing 'event_fd':
close(event_fd);
... automatically detaches BPF program from it.
BPF programs can call in-kernel helper functions to:
- lookup/update/delete elements in maps
- probe_read - wraper of probe_kernel_read() used to access any kernel data structures
BPF programs receive 'struct pt_regs *' as an input ('struct pt_regs' is architecture dependent) and return 0 to ignore the event and 1 to store kprobe event into the ring buffer.
Note, kprobes are a fundamentally _not_ a stable kernel ABI, so BPF programs attached to kprobes must be recompiled for every kernel version and user must supply correct LINUX_VERSION_CODE in attr.kern_version during bpf_prog_load() call.
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David S. Miller <davem@davemloft.net> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1427312966-8434-4-git-send-email-ast@plumgrid.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
show more ...
|