Stefan Kristiansson
2014-04-10 20:09:46 UTC
So, I brought up the issue of the OpenRISC 1000 ISA missing support for atomic
operations for the v1.0 architecture proposals and we had a quick discussion
about it at orconf2012. The conclusion then was that we needed something
more concrete than just "we should add it" and I promised to look into it.
As I always try to keep my promises, albeit sometimes slowly, here we are.
(In all honesty the real reason why I started to look into this is that I've
started a musl port, and to forge on with the 'atomic syscall' path with that
just felt wrong.)
Anyways, I've done some research since, and I did the discovery that at a point
in history (up to around 2005), there has been atomic operation instructions
described (although rather vaguely) in the arch specification.
I found the old arch specs here:
http://opencores.org/websvn,listing?repname=or1k&path=%2For1k%2Ftrunk%2Fdocs%2F#path_or1k_trunk_docs_
To be more precise, this was what was said about atomicity:
"7.3 Atomicity
A memory access is atomic if it is always performed in its entirerty with no
visible fragmentation. Atomic memory accesses are specifically required to
implement software semaphores and other shared structures in systems where two
different processes on the same processor, or two different processors in a
multiprocessor environment, access the same memory location with intent to
modify it.
The OpenRISC 1000 architecture provides two dedicated instructions that together
perform an atomic read-modify-write operation.
l.lwa rD, I(rA)
l.swa I(rA), rB
Instruction l.lwa loads single word from memory, creating a reservation for a
subsequent conditional store operation. A special register, invisible to the
programmer, is used to hold the address of the memory location, which is used in
the atomic read-modify-write operation.
The reservation for a subsequent l.swa is cancelled if another master reads the
same memory location (snoop hit), another l.lwa is executed or if the software
explicitly clears the reservation register.
If a reservation is still valid when the corresponding l.swa is executed, l.swa
stores general-purpose register rB into the memory.
If reservation was cancelled, l.swa is executed as no operation."
There are a couple of things that are left undefined in the text above,
but it gave me a base to start off from (if nothing else, the names of the
instructions).
What I am proposing is revise the text above and bringing it back to the arch
specification.
And I am proposing that the revised text should contain the following bullet
points (with remarks and opening for discussions):
- A load link can be broken by either:
1) another l.lwa instruction
2) another l.swa instrucion
3) another store to the linked address
4) a context switch (exception)
- The granularity of the link is a word.
(I'm certainly open for discussions on this one, e.g. a cacheline could make
sense too)
- The result (1 for success and 0 for fail) of the store conditional is stored
in the source register of the l.swa instruction.
I.e. 'rB' in 'l.swa I(rA), rB'.
(I was in a split mind between choosing the flag bit, the carry bit or
the l.swa source register. The reason I choose the register is because the
flag is easily a critical path in the rtl implementations, the carry bit
requires l.addc which isn't always included (despite being a mandatory
instruction))
As a proof of concept, I've implemented the behaviour described above in
binutils and or1ksim and wrote a set of tests for or1ksim to ensure it's
behaviour.
The 6-bit opcode I used for l.lwa is 0x1b and the opcode for l.swa is 0x33.
I'm not going to post the patches for binutils and or1ksim just yet,
because I wanted to let people to raise their voices before doing that.
But if there are no objections to what I propose here, I'll probably move
forward with what I have.
However, if someone is anyways interested in looking at the patches I've
made them public here:
https://github.com/skristiansson/or1k-src/commit/e7d1ef5f9c2f698f4e41cd6e3e739df1201fe18c
(The binutils patch still needs some work, the cgen simulator will not grok
the atomicity property of them like that)
and here:
https://github.com/skristiansson/or1ksim/commit/3afc310f4e6c7aa50dd823ab36e2fb365a1a0de7
Stefan
operations for the v1.0 architecture proposals and we had a quick discussion
about it at orconf2012. The conclusion then was that we needed something
more concrete than just "we should add it" and I promised to look into it.
As I always try to keep my promises, albeit sometimes slowly, here we are.
(In all honesty the real reason why I started to look into this is that I've
started a musl port, and to forge on with the 'atomic syscall' path with that
just felt wrong.)
Anyways, I've done some research since, and I did the discovery that at a point
in history (up to around 2005), there has been atomic operation instructions
described (although rather vaguely) in the arch specification.
I found the old arch specs here:
http://opencores.org/websvn,listing?repname=or1k&path=%2For1k%2Ftrunk%2Fdocs%2F#path_or1k_trunk_docs_
To be more precise, this was what was said about atomicity:
"7.3 Atomicity
A memory access is atomic if it is always performed in its entirerty with no
visible fragmentation. Atomic memory accesses are specifically required to
implement software semaphores and other shared structures in systems where two
different processes on the same processor, or two different processors in a
multiprocessor environment, access the same memory location with intent to
modify it.
The OpenRISC 1000 architecture provides two dedicated instructions that together
perform an atomic read-modify-write operation.
l.lwa rD, I(rA)
l.swa I(rA), rB
Instruction l.lwa loads single word from memory, creating a reservation for a
subsequent conditional store operation. A special register, invisible to the
programmer, is used to hold the address of the memory location, which is used in
the atomic read-modify-write operation.
The reservation for a subsequent l.swa is cancelled if another master reads the
same memory location (snoop hit), another l.lwa is executed or if the software
explicitly clears the reservation register.
If a reservation is still valid when the corresponding l.swa is executed, l.swa
stores general-purpose register rB into the memory.
If reservation was cancelled, l.swa is executed as no operation."
There are a couple of things that are left undefined in the text above,
but it gave me a base to start off from (if nothing else, the names of the
instructions).
What I am proposing is revise the text above and bringing it back to the arch
specification.
And I am proposing that the revised text should contain the following bullet
points (with remarks and opening for discussions):
- A load link can be broken by either:
1) another l.lwa instruction
2) another l.swa instrucion
3) another store to the linked address
4) a context switch (exception)
- The granularity of the link is a word.
(I'm certainly open for discussions on this one, e.g. a cacheline could make
sense too)
- The result (1 for success and 0 for fail) of the store conditional is stored
in the source register of the l.swa instruction.
I.e. 'rB' in 'l.swa I(rA), rB'.
(I was in a split mind between choosing the flag bit, the carry bit or
the l.swa source register. The reason I choose the register is because the
flag is easily a critical path in the rtl implementations, the carry bit
requires l.addc which isn't always included (despite being a mandatory
instruction))
As a proof of concept, I've implemented the behaviour described above in
binutils and or1ksim and wrote a set of tests for or1ksim to ensure it's
behaviour.
The 6-bit opcode I used for l.lwa is 0x1b and the opcode for l.swa is 0x33.
I'm not going to post the patches for binutils and or1ksim just yet,
because I wanted to let people to raise their voices before doing that.
But if there are no objections to what I propose here, I'll probably move
forward with what I have.
However, if someone is anyways interested in looking at the patches I've
made them public here:
https://github.com/skristiansson/or1k-src/commit/e7d1ef5f9c2f698f4e41cd6e3e739df1201fe18c
(The binutils patch still needs some work, the cgen simulator will not grok
the atomicity property of them like that)
and here:
https://github.com/skristiansson/or1ksim/commit/3afc310f4e6c7aa50dd823ab36e2fb365a1a0de7
Stefan