Vicente, you're right
The fPIC option allows GCC to not use absolute address for globals vars. Using absolute referencing is OK on 32 bits, but use twice more memory with 64 bits (of course). So the idea behind fPIC is simply to use (32 bits) offset referencing. It costs a few more instructions, reduces allocation "page" space to 2 GB, but allows binaries to stay small.
Short story: fPIC produce "relocatable" code, so it needs memory offset mapping, something quite AMD64 specific/native to me. Let's sort it out.
Here's a sample C file :
Code:
int global = 0;
int test(int value)
{
global = value;
return ++global;
}
int main(void)
{
test(29);
return 0;
}
First compilation, without fPIC:
gcc a.c
objdump -d a.out
Test function looks like:
Code:
08048344 <test>:
8048344: 55 push %ebp
8048345: 89 e5 mov %esp,%ebp
8048347: 8b 45 08 mov 0x8(%ebp),%eax
804834a: a3 58 95 04 08 mov %eax,0x8049558
804834f: a1 58 95 04 08 mov 0x8049558,%eax
8048354: 83 c0 01 add $0x1,%eax
8048357: a3 58 95 04 08 mov %eax,0x8049558
804835c: a1 58 95 04 08 mov 0x8049558,%eax
8048361: 5d pop %ebp
8048362: c3 ret
Second test with fPIC:
gcc -fPIC a.c
objdump -d a.out
Code:
08048374 <test>:
8048374: 55 push %ebp
8048375: 89 e5 mov %esp,%ebp
8048377: e8 66 00 00 00 call 80483e2 <__i686.get_pc_thunk.cx>
804837c: 81 c1 1c 12 00 00 add $0x121c,%ecx
8048382: 8b 91 fc ff ff ff mov -0x4(%ecx),%edx
8048388: 8b 45 08 mov 0x8(%ebp),%eax
804838b: 89 02 mov %eax,(%edx)
804838d: 8b 81 fc ff ff ff mov -0x4(%ecx),%eax
8048393: 8b 00 mov (%eax),%eax
8048395: 8d 50 01 lea 0x1(%eax),%edx
8048398: 8b 81 fc ff ff ff mov -0x4(%ecx),%eax
804839e: 89 10 mov %edx,(%eax)
80483a0: 8b 81 fc ff ff ff mov -0x4(%ecx),%eax
80483a6: 8b 00 mov (%eax),%eax
80483a8: 5d pop %ebp
80483a9: c3 ret
This result is on a 32 bits Intel. Even without particular asm skills, It's quite clear that fPIC generate insane things, since the fPIC version of test() function is calling another internal function (__i686.get_pc_thunk.cx), and is doing strange things with ecx registrer. 17 lines of code, versus 10 without fPIC.
It's (a lot) slower, and the binary is larger !
(in facts, I don't care about binary size, with Raydium, our only goad is speed). In this case, fPIC is a total failure.
So, let's please do the same test on AMD64 (where GCC should use some CPU features to generate "light" relocatable code) and Intel x86_64, where I think such features are not available, but where I can be completely wrong
edit: made the post a bit more readable.