Perhaps you have to deal with message buffers in general or serialized arrays of mixed types, i.e., different arrays packed together into a generic char buffer array on host side and then unpacking on device side again. There you have to take care about aligning the data right, otherwise kernel launches will fail due to misaligned memory accesses. A 64-bit address is only allowed to start at multiples of 8 bytes.
If there is a char array with packed data in this order:
[3x int, 12x double]
... then misalignment takes place. The double starts not at a 64-bit aligned address. Launching a kernel which unpacks the buffer will fail. cuda-memcheck helps you out by this hint:
========= Invalid __global__ read of size 8
========= at 0x00000248 in kernel(char*, unsigned int)
========= by thread (0,0,0) in block (0,0,0)
========= Address 0x500340004 is misaligned
You can place the data in descending order according to the type size:
[12x double, 3x int]
So remember about aligning next time (also pointing at me!).
If you just come here to see some old-fashioned serialization in action, here you get it: