My goal is to create a PCIe transaction with more than 64b payload. For that I need to read an ioremap()
address.
For 128b and 256b I can use xmm
and ymm
registers respectively and that works as expected.
Now, I'd like to do the same for 512b zmm
registers (memory-like storage?!)
A code under license I'm not allowed to show here, uses assembly code for 256b:
void __iomem *addr;
uint8_t datareg[32];
[...]
// Read memory address to ymm (to have 256b at once):
asm volatile("vmovdqa %0,%%ymm1" : : "m"(*(volatile uint8_t * __force) addr));
// Copy ymm data to stack data: (to be able to use that in a gcc handled code)
asm volatile("vmovdqa %%ymm1,%0" :"=m"(datareg): :"memory");
This is to be used in a kernel module compiled with EXTRA_CFLAGS += -mavx2 -mavx512f
to support AVX-512.
- Why does this example use
ymm1
and not a different registerymm0-2-3-4..15
? - How can I read an address to a 512b
zmm
register? - How can I be sure the register won't be overwritten between the two
asm
lines?
Simply replacing ymm
by zmm
, gcc shows Error: operand size mismatch for
vmovdqa'`.
If that code isn't correct or the best practice, let solve that first since I just started to dig into that.