ASSERT FAILED: drbd_worker - big trouble

Hi guys,

I am having big troubles with drbd.

The resources detch themselves. I get a lot of this logs from the kernel on both nodes:
> [11304.180242] PAX: refcount overflow detected in: drbd2_worker:2212, uid/euid: 0/0

[11304.180243] CPU 1

[11304.180243] Modules linked in: iptLOG drbd lrucache ip6tablefilter ip6tablemangle ip6tables iptREJECT xtrecent xtstate xttcpudp iptablefilter iptablemangle kvmintel kvm authenc esp4 ah4 xfrm4modetransport deflate zlibdeflate ctr twofishgeneric twofishx86643way twofishx8664 twofishcommon camellia serpent blowfishgeneric blowfishx8664 blowfishcommon cast5 desgeneric xcbc rmd160 sha512generic cryptonull afkey xtaddrtype i7coreedac iptMASQUERADE nfsd shpchp iptablenat nfnat nfs nfconntrackipv4 nfconntrack nfdefragipv4 iptables xtables lockd psmouse dcdbas machid serioraw acpipowermeter wmi edaccore bridge fscache tpmtis authrpcgss stp nfsacl sunrpc lp parport usbhid hid ses enclosure megaraid_sas bnx2

[11304.180261]

[11304.180262] Pid: 2212, comm: drbd2_worker Not tainted 3.2.29-grsec

[11304.180264] RIP: 0010:[] [] bmpageio_async+0x1a3/0x250 [drbd]

[11304.180267] RSP: 0018:ffff880612fb5c60 EFLAGS: 00000a12

[11304.180268] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000000

[11304.180269] RDX: 000000000000a23e RSI: ffffffff8132d670 RDI: ffff8805fae32144

[11304.180271] RBP: ffff880612fb5cf0 R08: 0000000000000000 R09: ffff88060907c0c0

[11304.180272] R10: 00000001b77407b0 R11: 0000000000000000 R12: ffff88060907c0c0

[11304.180273] R13: ffff8806125ba000 R14: ffff880612fb5d20 R15: ffff8805fc629200

[11304.180274] FS: 0000000000000000(0000) GS:ffff88063f620000(0000) knlGS:0000000000000000

[11304.180275] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b

[11304.180276] CR2: 000001866c3c1158 CR3: 00000000016bf000 CR4: 00000000000006f0

[11304.180277] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000

[11304.180279] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400

[11304.180280] Process drbd2_worker (pid: 2212, threadinfo ffff8805fa419630, task ffff8805fa419100)

[11304.180281] Stack:

[11304.180281] ffff8805fa419630 00000000dbb9ffff fffe484800000001 00000000dbb9fff8

[11304.180283] ffff880612fb5c90 ffffea0018288670 ffff880612fb5cb0 ffff8805fa419630

[11304.180285] 0000000000000a9b 0000000000000000 0000000000002a9a ffff8806125ba100

[11304.180288] Call Trace:

[11304.180291] [] bm_rw+0x178/0x440 [drbd]

[11304.180294] [] drbdbmwrite+0x15/0x20 [drbd]

[11304.180298] [] wbitmapio+0xe2/0x2a0 [drbd]

[11304.180302] [] drbd_worker+0x21e/0x4c0 [drbd]

[11304.180306] [] ? drbd_open+0xb0/0xb0 [drbd]

[11304.180310] [] drbdthreadsetup+0x64/0xf0 [drbd]

[11304.180314] [] ? drbd_open+0xb0/0xb0 [drbd]

[11304.180316] [] kthread+0x8c/0xa0

[11304.180318] [] kernelthreadhelper+0x4/0x10

[11304.180320] [] ? flushkthreadworker+0xa0/0xa0

[11304.180322] [] ? gs_change+0x13/0x13

[11304.180322] Code: 80 4d 89 74 24 58 4c 89 e6 49 c7 44 24 50 80 da 3f a0 e8 a1 2d f0 e0 f0 41 01 9d d8 0a 00 00 71 0a f0 41 29 9d d8 0a 00 00 cd 04 <48> 83 c4 68 5b 41 5c 41 5d 41 5e 41 5f 5d c3 66 0f 1f 44 00 00

[11304.180335] Call Trace:

[11304.180338] [] bm_rw+0x178/0x440 [drbd]

[11304.180341] [] drbdbmwrite+0x15/0x20 [drbd]

[11304.180345] [] wbitmapio+0xe2/0x2a0 [drbd]

[11304.180349] [] drbd_worker+0x21e/0x4c0 [drbd]

[11304.180353] [] ? drbd_open+0xb0/0xb0 [drbd]

[11304.180357] [] drbdthreadsetup+0x64/0xf0 [drbd]

[11304.180362] [] ? drbd_open+0xb0/0xb0 [drbd]

[11304.180363] [] kthread+0x8c/0xa0

[11304.180365] [] kernelthreadhelper+0x4/0x10

[11304.180367] [] ? flushkthreadworker+0xa0/0xa0

[11304.180369] [] ? gs_change+0x13/0x13

[11304.180371] block drbd2: bitmap WRITE of 5871 pages took 265 jiffies

[11304.180382] block drbd2: 734 GB (192322088 bits) marked out-of-sync by on disk bit-map.

[11304.180386] block drbd2: ASSERT FAILED: drbdworker: (gettstate(thi) == Running) in drivers/block/drbd/drbdworker.c:1645

[11304.180465] block drbd2: Connection closed

[11304.180469] block drbd2: conn( BrokenPipe -> Unconnected )

[11304.180471] block drbd2: receiver terminated

[11304.180472] block drbd2: Restarting drbd2_receiver

[11304.180473] block drbd2: receiver (re)started

[11304.180476] block drbd2: conn( Unconnected -> WFConnection )

Does this speak to anyone?

What is the risk for data?

Thank you

0 Replies

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct