Searched hist:"6094628 bfd94323fc1cea05ec2c6affd98c18f7f" (Results 1 – 2 of 2) sorted by relevance
/openbmc/linux/net/rds/ |
H A D | ib_send.c | diff 18fc25c94eadc52a42c025125af24657a93638c0 Mon Dec 02 17:41:39 CST 2013 Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> rds: prevent BUG_ON triggered on congestion update to loopback
After congestion update on a local connection, when rds_ib_xmit returns less bytes than that are there in the message, rds_send_xmit calls back rds_ib_xmit with an offset that causes BUG_ON(off & RDS_FRAG_SIZE) to trigger.
For a 4Kb PAGE_SIZE rds_ib_xmit returns min(8240,4096)=4096 when actually the message contains 8240 bytes. rds_send_xmit thinks there is more to send and calls rds_ib_xmit again with a data offset "off" of 4096-48(rds header) =4048 bytes thus hitting the BUG_ON(off & RDS_FRAG_SIZE) [RDS_FRAG_SIZE=4k].
The commit 6094628bfd94323fc1cea05ec2c6affd98c18f7f "rds: prevent BUG_ON triggering on congestion map updates" introduced this regression. That change was addressing the triggering of a different BUG_ON in rds_send_xmit() on PowerPC architecture with 64Kbytes PAGE_SIZE: BUG_ON(ret != 0 && conn->c_xmit_sg == rm->data.op_nents); This was the sequence it was going through: (rds_ib_xmit) /* Do not send cong updates to IB loopback */ if (conn->c_loopback && rm->m_inc.i_hdr.h_flags & RDS_FLAG_CONG_BITMAP) { rds_cong_map_updated(conn->c_fcong, ~(u64) 0); return sizeof(struct rds_header) + RDS_CONG_MAP_BYTES; } rds_ib_xmit returns 8240 rds_send_xmit: c_xmit_data_off = 0 + 8240 - 48 (rds header accounted only the first time) = 8192 c_xmit_data_off < 65536 (sg->length), so calls rds_ib_xmit again rds_ib_xmit returns 8240 rds_send_xmit: c_xmit_data_off = 8192 + 8240 = 16432, calls rds_ib_xmit again and so on (c_xmit_data_off 24672,32912,41152,49392,57632) rds_ib_xmit returns 8240 On this iteration this sequence causes the BUG_ON in rds_send_xmit: while (ret) { tmp = min_t(int, ret, sg->length - conn->c_xmit_data_off); [tmp = 65536 - 57632 = 7904] conn->c_xmit_data_off += tmp; [c_xmit_data_off = 57632 + 7904 = 65536] ret -= tmp; [ret = 8240 - 7904 = 336] if (conn->c_xmit_data_off == sg->length) { conn->c_xmit_data_off = 0; sg++; conn->c_xmit_sg++; BUG_ON(ret != 0 && conn->c_xmit_sg == rm->data.op_nents); [c_xmit_sg = 1, rm->data.op_nents = 1]
What the current fix does: Since the congestion update over loopback is not actually transmitted as a message, all that rds_ib_xmit needs to do is let the caller think the full message has been transmitted and not return partial bytes. It will return 8240 (RDS_CONG_MAP_BYTES+48) when PAGE_SIZE is 4Kb. And 64Kb+48 when page size is 64Kb.
Reported-by: Josh Hunt <joshhunt00@gmail.com> Tested-by: Honggang Li <honli@redhat.com> Acked-by: Bang Nguyen <bang.nguyen@oracle.com> Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net> diff 6094628bfd94323fc1cea05ec2c6affd98c18f7f Wed Mar 02 00:28:22 CST 2011 Neil Horman <nhorman@tuxdriver.com> rds: prevent BUG_ON triggering on congestion map updates
Recently had this bug halt reported to me:
kernel BUG at net/rds/send.c:329! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: rds sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt dm_mod [last unloaded: scsi_wait_scan] NIP: d000000003ca68f4 LR: d000000003ca67fc CTR: d000000003ca8770 REGS: c000000175cab980 TRAP: 0700 Not tainted (2.6.32-118.el6.ppc64) MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 44000022 XER: 00000000 TASK = c00000017586ec90[1896] 'krdsd' THREAD: c000000175ca8000 CPU: 0 GPR00: 0000000000000150 c000000175cabc00 d000000003cb7340 0000000000002030 GPR04: ffffffffffffffff 0000000000000030 0000000000000000 0000000000000030 GPR08: 0000000000000001 0000000000000001 c0000001756b1e30 0000000000010000 GPR12: d000000003caac90 c000000000fa2500 c0000001742b2858 c0000001742b2a00 GPR16: c0000001742b2a08 c0000001742b2820 0000000000000001 0000000000000001 GPR20: 0000000000000040 c0000001742b2814 c000000175cabc70 0800000000000000 GPR24: 0000000000000004 0200000000000000 0000000000000000 c0000001742b2860 GPR28: 0000000000000000 c0000001756b1c80 d000000003cb68e8 c0000001742b27b8 NIP [d000000003ca68f4] .rds_send_xmit+0x4c4/0x8a0 [rds] LR [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds] Call Trace: [c000000175cabc00] [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds] (unreliable) [c000000175cabd30] [d000000003ca7e64] .rds_send_worker+0x54/0x100 [rds] [c000000175cabdb0] [c0000000000b475c] .worker_thread+0x1dc/0x3c0 [c000000175cabed0] [c0000000000baa9c] .kthread+0xbc/0xd0 [c000000175cabf90] [c000000000032114] .kernel_thread+0x54/0x70 Instruction dump: 4bfffd50 60000000 60000000 39080001 935f004c f91f0040 41820024 813d017c 7d094a78 7d290074 7929d182 394a0020 <0b090000> 40e2ff68 4bffffa4 39200000 Kernel panic - not syncing: Fatal exception Call Trace: [c000000175cab560] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable) [c000000175cab610] [c0000000005a365c] .panic+0x80/0x1b4 [c000000175cab6a0] [c00000000002fbcc] .die+0x21c/0x2a0 [c000000175cab750] [c000000000030000] ._exception+0x110/0x220 [c000000175cab910] [c000000000004b9c] program_check_common+0x11c/0x180
Signed-off-by: David S. Miller <davem@davemloft.net>
|
H A D | loop.c | diff 6094628bfd94323fc1cea05ec2c6affd98c18f7f Wed Mar 02 00:28:22 CST 2011 Neil Horman <nhorman@tuxdriver.com> rds: prevent BUG_ON triggering on congestion map updates
Recently had this bug halt reported to me:
kernel BUG at net/rds/send.c:329! Oops: Exception in kernel mode, sig: 5 [#1] SMP NR_CPUS=1024 NUMA pSeries Modules linked in: rds sunrpc ipv6 dm_mirror dm_region_hash dm_log ibmveth sg ext4 jbd2 mbcache sd_mod crc_t10dif ibmvscsic scsi_transport_srp scsi_tgt dm_mod [last unloaded: scsi_wait_scan] NIP: d000000003ca68f4 LR: d000000003ca67fc CTR: d000000003ca8770 REGS: c000000175cab980 TRAP: 0700 Not tainted (2.6.32-118.el6.ppc64) MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 44000022 XER: 00000000 TASK = c00000017586ec90[1896] 'krdsd' THREAD: c000000175ca8000 CPU: 0 GPR00: 0000000000000150 c000000175cabc00 d000000003cb7340 0000000000002030 GPR04: ffffffffffffffff 0000000000000030 0000000000000000 0000000000000030 GPR08: 0000000000000001 0000000000000001 c0000001756b1e30 0000000000010000 GPR12: d000000003caac90 c000000000fa2500 c0000001742b2858 c0000001742b2a00 GPR16: c0000001742b2a08 c0000001742b2820 0000000000000001 0000000000000001 GPR20: 0000000000000040 c0000001742b2814 c000000175cabc70 0800000000000000 GPR24: 0000000000000004 0200000000000000 0000000000000000 c0000001742b2860 GPR28: 0000000000000000 c0000001756b1c80 d000000003cb68e8 c0000001742b27b8 NIP [d000000003ca68f4] .rds_send_xmit+0x4c4/0x8a0 [rds] LR [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds] Call Trace: [c000000175cabc00] [d000000003ca67fc] .rds_send_xmit+0x3cc/0x8a0 [rds] (unreliable) [c000000175cabd30] [d000000003ca7e64] .rds_send_worker+0x54/0x100 [rds] [c000000175cabdb0] [c0000000000b475c] .worker_thread+0x1dc/0x3c0 [c000000175cabed0] [c0000000000baa9c] .kthread+0xbc/0xd0 [c000000175cabf90] [c000000000032114] .kernel_thread+0x54/0x70 Instruction dump: 4bfffd50 60000000 60000000 39080001 935f004c f91f0040 41820024 813d017c 7d094a78 7d290074 7929d182 394a0020 <0b090000> 40e2ff68 4bffffa4 39200000 Kernel panic - not syncing: Fatal exception Call Trace: [c000000175cab560] [c000000000012e04] .show_stack+0x74/0x1c0 (unreliable) [c000000175cab610] [c0000000005a365c] .panic+0x80/0x1b4 [c000000175cab6a0] [c00000000002fbcc] .die+0x21c/0x2a0 [c000000175cab750] [c000000000030000] ._exception+0x110/0x220 [c000000175cab910] [c000000000004b9c] program_check_common+0x11c/0x180
Signed-off-by: David S. Miller <davem@davemloft.net>
|