Discussion:
[OpenRISC] or1200/mor1kx profiling result difference
Iñigo Muguruza
2014-06-17 15:00:39 UTC
Permalink
Dear OpenRISC community,

I write this mail to inform about an issue I have found when I analyzed the
outcome data coming from or1200 and mor1kx profiling. I am working in
making a OpenRISC tutorial to help people install and execute the diverse
open-soruce SoC tools. In this tutorial a profiling test for or1200/mor1kx
is made. Surprisingly, the results are really different, inducting to think
about a fault/bug. (results in the attached txts)

In the mor1kx, _reset and or1k_dmu_disable routines, takes really a big
percentage of the execution, while in or1200 is does not happen the same.

The tools that I have used to make the profiling are the next ones:

- De0_nano board
- mor1kx and or1200 .sof generated with quartus tools. (changing defines
from orpsoc-defines.v)
- Simple test code, basically two decrementing loops that are called by the
main. It also prints messages at the end of each loop (see prof-test.c from
the link below).
- OpenOCD
- Telnet client for running the test.
- OpenOCD patch. Enables to execute profiling in or1200/mor1kx. gmon file
data writing for big endian targets. http://openocd.zylin.com/#/c/2168/

The procedure, is the one described in the "profiling" section of this
tutorial:

http://wiki.mintsoc.org/doku.php?id=run_programs_on_the_altera_de0_nano_board#profiling

Best regards,

Inigo Muguruza
Franck Jullien
2014-06-17 18:57:12 UTC
Permalink
Post by Iñigo Muguruza
Dear OpenRISC community,
I write this mail to inform about an issue I have found when I analyzed the
outcome data coming from or1200 and mor1kx profiling. I am working in making
a OpenRISC tutorial to help people install and execute the diverse
open-soruce SoC tools. In this tutorial a profiling test for or1200/mor1kx
is made. Surprisingly, the results are really different, inducting to think
about a fault/bug. (results in the attached txts)
In the mor1kx, _reset and or1k_dmu_disable routines, takes really a big
percentage of the execution, while in or1200 is does not happen the same.
- De0_nano board
- mor1kx and or1200 .sof generated with quartus tools. (changing defines
from orpsoc-defines.v)
- Simple test code, basically two decrementing loops that are called by the
main. It also prints messages at the end of each loop (see prof-test.c from
the link below).
- OpenOCD
- Telnet client for running the test.
- OpenOCD patch. Enables to execute profiling in or1200/mor1kx. gmon file
data writing for big endian targets. http://openocd.zylin.com/#/c/2168/
The procedure, is the one described in the "profiling" section of this
http://wiki.mintsoc.org/doku.php?id=run_programs_on_the_altera_de0_nano_board#profiling
Best regards,
Inigo Muguruza
_______________________________________________
OpenRISC mailing list
http://lists.openrisc.net/listinfo/openrisc
Hi Iñigo,

I don't have answer to your questions.
However, as you work on this, can you try to apply what is suggested
here: http://openocd.zylin.com/#/c/2168/1/src/target/target.c,cm

Thanks,

Franck.
Jose Teixeira de Sousa
2014-06-17 19:10:15 UTC
Permalink
Hi,

Iñigo and I work together. The results he presented were obtained with the
mentioned OpenOCD patch already in place, which I submitted recently.

There is definitely a difference between OR1200 and MOR1KX in handling the
OpenOCD requests via the ADBG and debug interface.

The purpose of reporting this problem is to help towards fixing these
problems in MOR1KX.

Unrelated to this, has anyone noticed that OR1200 simulates much faster in
Verilator as compared to MOR1KX?

Thanks,

Jose
Post by Iñigo Muguruza
Post by Iñigo Muguruza
Dear OpenRISC community,
I write this mail to inform about an issue I have found when I analyzed
the
Post by Iñigo Muguruza
outcome data coming from or1200 and mor1kx profiling. I am working in
making
Post by Iñigo Muguruza
a OpenRISC tutorial to help people install and execute the diverse
open-soruce SoC tools. In this tutorial a profiling test for
or1200/mor1kx
Post by Iñigo Muguruza
is made. Surprisingly, the results are really different, inducting to
think
Post by Iñigo Muguruza
about a fault/bug. (results in the attached txts)
In the mor1kx, _reset and or1k_dmu_disable routines, takes really a big
percentage of the execution, while in or1200 is does not happen the same.
- De0_nano board
- mor1kx and or1200 .sof generated with quartus tools. (changing defines
from orpsoc-defines.v)
- Simple test code, basically two decrementing loops that are called by
the
Post by Iñigo Muguruza
main. It also prints messages at the end of each loop (see prof-test.c
from
Post by Iñigo Muguruza
the link below).
- OpenOCD
- Telnet client for running the test.
- OpenOCD patch. Enables to execute profiling in or1200/mor1kx. gmon file
data writing for big endian targets. http://openocd.zylin.com/#/c/2168/
The procedure, is the one described in the "profiling" section of this
http://wiki.mintsoc.org/doku.php?id=run_programs_on_the_altera_de0_nano_board#profiling
Post by Iñigo Muguruza
Best regards,
Inigo Muguruza
_______________________________________________
OpenRISC mailing list
http://lists.openrisc.net/listinfo/openrisc
Hi Iñigo,
I don't have answer to your questions.
However, as you work on this, can you try to apply what is suggested
here: http://openocd.zylin.com/#/c/2168/1/src/target/target.c,cm
Thanks,
Franck.
_______________________________________________
OpenRISC mailing list
http://lists.openrisc.net/listinfo/openrisc
--
Jose T. de Sousa, PhD
Office: +351 213 100 213
R. Alves Redol 9
1000-029 Lisboa
Portugal
Stefan Kristiansson
2014-06-17 21:00:49 UTC
Permalink
On Tue, Jun 17, 2014 at 10:10 PM, Jose Teixeira de Sousa
Post by Jose Teixeira de Sousa
Unrelated to this, has anyone noticed that OR1200 simulates much faster in
Verilator as compared to MOR1KX?
No, rather the opposite (since OR1200 is executing code a lot slower).

Stefan
Jose Teixeira de Sousa
2014-06-17 23:18:32 UTC
Permalink
Hmmmm, actually I should have written "related" as this slowness was
observed while OpenOCD was doing the halt/resume sequence to get PC values
for profiling, as I believe you that in general Verilator goes faster on
MOR1KX.


On Tue, Jun 17, 2014 at 10:00 PM, Stefan Kristiansson <
Post by Stefan Kristiansson
On Tue, Jun 17, 2014 at 10:10 PM, Jose Teixeira de Sousa
Post by Jose Teixeira de Sousa
Unrelated to this, has anyone noticed that OR1200 simulates much faster
in
Post by Jose Teixeira de Sousa
Verilator as compared to MOR1KX?
No, rather the opposite (since OR1200 is executing code a lot slower).
Stefan
--
Jose T. de Sousa, PhD
Office: +351 213 100 213
R. Alves Redol 9
1000-029 Lisboa
Portugal
Stefan Kristiansson
2014-06-17 20:59:55 UTC
Permalink
On Tue, Jun 17, 2014 at 6:00 PM, Iñigo Muguruza
Post by Iñigo Muguruza
Dear OpenRISC community,
I write this mail to inform about an issue I have found when I analyzed the
outcome data coming from or1200 and mor1kx profiling. I am working in making
a OpenRISC tutorial to help people install and execute the diverse
open-soruce SoC tools. In this tutorial a profiling test for or1200/mor1kx
is made. Surprisingly, the results are really different, inducting to think
about a fault/bug. (results in the attached txts)
In the mor1kx, _reset and or1k_dmu_disable routines, takes really a big
percentage of the execution, while in or1200 is does not happen the same.
It shouldn't ever end up in or1k_dmmu_disable, so obviously
stalling/unstalling mor1kx at some particular spot triggers a bug.
I can reproduce it even manually by hitting ctrl-c + c in gdb
repeatedly many times.
I'll take a closer look what's happening.

Stefan
Stefan Kristiansson
2014-06-22 15:31:07 UTC
Permalink
On Tue, Jun 17, 2014 at 11:59 PM, Stefan Kristiansson
Post by Stefan Kristiansson
On Tue, Jun 17, 2014 at 6:00 PM, Iñigo Muguruza
Post by Iñigo Muguruza
Dear OpenRISC community,
I write this mail to inform about an issue I have found when I analyzed the
outcome data coming from or1200 and mor1kx profiling. I am working in making
a OpenRISC tutorial to help people install and execute the diverse
open-soruce SoC tools. In this tutorial a profiling test for or1200/mor1kx
is made. Surprisingly, the results are really different, inducting to think
about a fault/bug. (results in the attached txts)
In the mor1kx, _reset and or1k_dmu_disable routines, takes really a big
percentage of the execution, while in or1200 is does not happen the same.
It shouldn't ever end up in or1k_dmmu_disable, so obviously
stalling/unstalling mor1kx at some particular spot triggers a bug.
I can reproduce it even manually by hitting ctrl-c + c in gdb
repeatedly many times.
I'll take a closer look what's happening.
I think I've got this sorted out now and have committed the related fixes to:
https://github.com/openrisc/mor1kx

I get an output like this now:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
57.34 2068.55 2068.55 loop_b
21.29 2836.67 768.12 __uart_putc
16.75 3440.83 604.16 loop_a
1.06 3479.23 38.40 __sfvwrite_r
0.78 3507.39 28.16 memchr
0.71 3532.99 25.60 _write
0.43 3548.35 15.36 memmove
0.43 3563.71 15.36 strlen
0.28 3573.95 10.24 __sflush_r
0.28 3584.19 10.24 __swrite
0.14 3589.31 5.12 _fflush_r
0.14 3594.43 5.12 _puts_r
0.14 3599.55 5.12 puts
0.07 3602.11 2.56 __reset
0.07 3604.67 2.56 __sclose
0.07 3607.23 2.56 __uart_init

Please give it a go, and thanks a lot for bug reports like this, it's
much appreciated.

On a related note, while applying the profiling patch mentioned in the
original mail,
I had some troubles with it. The "pc" register always read out as 0,
so I just bluntly changed it to "npc".
And while doing that, I noticed that you can apply your target
specific profiling routines, instead of the silly halt/resume default
one.
Since we can very well read the npc spr while the target is running,
this is a much better option.
Franck cooked together a tentative patch for it:
https://github.com/fjullien/openOCD/commit/86ec95584aa239b0bb7fe8881a6582006487d31b
and I added this to make it work:
http://pastie.org/9304431

Stefan
Iñigo Muguruza
2014-06-24 11:51:13 UTC
Permalink
Hi Stefan,

i'm trying to verify your results, but I'm not able. I have used the mor1kx
v2, running José's patch. The profiling test runs but at the end of it, an
assertion error is occurring, so the gmon is not written.

OpenOCD side:
Info : Profiling completed. 482 samples.
target state: halted
openocd: target.c:3566: write_gmon: Assertion `addressSpace >= 2' failed.
Aborted


So I have cloned Franck's openOCD using or1k-profile branch and edit or1k.c
to insert the changes you made, but unfortunately, the "make" command is
not running...

cc -D_GNU_SOURCE -Wall -I. -g -O2 -c -o jim-subcmd.o jim-subcmd.c
cc -D_GNU_SOURCE -Wall -I. -g -O2 -c -o jim-interactive.o
jim-interactive.c
jim-interactive.c: In function ‘Jim_InteractivePrompt’:
jim-interactive.c:91:9: error: ‘JIM_VERSION’ undeclared (first use in this
function)
jim-interactive.c:91:9: note: each undeclared identifier is reported only
once for each function it appears in
make[2]: *** [jim-interactive.o] Error 1
make[2]: Leaving directory `/home/inigom/Desktop/openOCD/jimtcl'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/home/inigom/Desktop/openOCD'
make: *** [all] Error 2

Did you faced some of this errors??

Thanks in advance!

Iñigo Muguruza


2014-06-22 16:31 GMT+01:00 Stefan Kristiansson <
Post by Stefan Kristiansson
On Tue, Jun 17, 2014 at 11:59 PM, Stefan Kristiansson
On Tue, Jun 17, 2014 at 6:00 PM, Iñigo Muguruza
Post by Iñigo Muguruza
Dear OpenRISC community,
I write this mail to inform about an issue I have found when I analyzed
the
Post by Iñigo Muguruza
outcome data coming from or1200 and mor1kx profiling. I am working in
making
Post by Iñigo Muguruza
a OpenRISC tutorial to help people install and execute the diverse
open-soruce SoC tools. In this tutorial a profiling test for
or1200/mor1kx
Post by Iñigo Muguruza
is made. Surprisingly, the results are really different, inducting to
think
Post by Iñigo Muguruza
about a fault/bug. (results in the attached txts)
In the mor1kx, _reset and or1k_dmu_disable routines, takes really a big
percentage of the execution, while in or1200 is does not happen the
same.
It shouldn't ever end up in or1k_dmmu_disable, so obviously
stalling/unstalling mor1kx at some particular spot triggers a bug.
I can reproduce it even manually by hitting ctrl-c + c in gdb
repeatedly many times.
I'll take a closer look what's happening.
https://github.com/openrisc/mor1kx
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
57.34 2068.55 2068.55 loop_b
21.29 2836.67 768.12 __uart_putc
16.75 3440.83 604.16 loop_a
1.06 3479.23 38.40 __sfvwrite_r
0.78 3507.39 28.16 memchr
0.71 3532.99 25.60 _write
0.43 3548.35 15.36 memmove
0.43 3563.71 15.36 strlen
0.28 3573.95 10.24 __sflush_r
0.28 3584.19 10.24 __swrite
0.14 3589.31 5.12 _fflush_r
0.14 3594.43 5.12 _puts_r
0.14 3599.55 5.12 puts
0.07 3602.11 2.56 __reset
0.07 3604.67 2.56 __sclose
0.07 3607.23 2.56 __uart_init
Please give it a go, and thanks a lot for bug reports like this, it's
much appreciated.
On a related note, while applying the profiling patch mentioned in the
original mail,
I had some troubles with it. The "pc" register always read out as 0,
so I just bluntly changed it to "npc".
And while doing that, I noticed that you can apply your target
specific profiling routines, instead of the silly halt/resume default
one.
Since we can very well read the npc spr while the target is running,
this is a much better option.
https://github.com/fjullien/openOCD/commit/86ec95584aa239b0bb7fe8881a6582006487d31b
http://pastie.org/9304431
Stefan
Stefan Kristiansson
2014-06-24 13:20:35 UTC
Permalink
On Tue, Jun 24, 2014 at 2:51 PM, Iñigo Muguruza
Post by Iñigo Muguruza
Hi Stefan,
i'm trying to verify your results, but I'm not able. I have used the mor1kx
v2, running José's patch. The profiling test runs but at the end of it, an
assertion error is occurring, so the gmon is not written.
Info : Profiling completed. 482 samples.
target state: halted
openocd: target.c:3566: write_gmon: Assertion `addressSpace >= 2' failed.
Aborted
Yes, that's what I said in my previous mail, José's patch is not
working, I don't know how you got it working earlier.
I worked (read - hacked) around the problem by changing:
struct reg *reg = register_get_by_name(target->reg_cache, "pc", 1);
to
struct reg *reg = register_get_by_name(target->reg_cache, "npc", 1);
in src/target/target.c

Stefan
Iñigo Muguruza
2014-06-25 16:25:11 UTC
Permalink
Hi again,

I'm sending a small pdf with the results that I'm having. I have run the
same test (prof-test-printf.c) four times, changing the OpenOCD version
(José's patch, Franck's or1k.profile repo + Stefan changes ). The profiling
is not working really well, as you can see in the pdf, _fini routine is
taking a lot of the execution %. I also have attached the gmon files and
the c code, in case you are interested in.

The gmon files with mintOpenOCD = openOCD + José's patch.


best regards,

Iñigo Muguruza


2014-06-24 14:20 GMT+01:00 Stefan Kristiansson <
On Tue, Jun 24, 2014 at 2:51 PM, Iñigo Muguruza
Post by Iñigo Muguruza
Hi Stefan,
i'm trying to verify your results, but I'm not able. I have used the
mor1kx
Post by Iñigo Muguruza
v2, running José's patch. The profiling test runs but at the end of it,
an
Post by Iñigo Muguruza
assertion error is occurring, so the gmon is not written.
Info : Profiling completed. 482 samples.
target state: halted
openocd: target.c:3566: write_gmon: Assertion `addressSpace >= 2' failed.
Aborted
Yes, that's what I said in my previous mail, José's patch is not
working, I don't know how you got it working earlier.
struct reg *reg = register_get_by_name(target->reg_cache, "pc", 1);
to
struct reg *reg = register_get_by_name(target->reg_cache, "npc", 1);
in src/target/target.c
Stefan
Stefan Kristiansson
2014-06-26 12:55:38 UTC
Permalink
On Wed, Jun 25, 2014 at 7:25 PM, Iñigo Muguruza
Post by Iñigo Muguruza
Hi again,
I'm sending a small pdf with the results that I'm having. I have run the
same test (prof-test-printf.c) four times, changing the OpenOCD version
(José's patch, Franck's or1k.profile repo + Stefan changes ). The profiling
is not working really well, as you can see in the pdf, _fini routine is
taking a lot of the execution %. I also have attached the gmon files and the
c code, in case you are interested in.
I don't now what to say here, your results doesn't match at all the
results I get here doing the exact same thing.
Can you double-check that you are in fat running the elf from the .c
that you attached and then also provide your elf for me to test with?

With Jose's patch (stall/resume) I get this:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
38.88 1799.86 1799.86 __uart_putc
35.34 3435.84 1635.98 loop_b
18.64 4298.56 862.72 loop_a
1.77 4380.48 81.92 __sfvwrite_r
1.60 4454.72 74.24 _write
1.00 4500.80 46.08 memchr
0.77 4536.64 35.84 memmove
0.50 4559.68 23.04 strlen
0.44 4580.16 20.48 _fflush_r
0.33 4595.52 15.36 __swrite
0.22 4605.76 10.24 __sflush_r
0.22 4616.00 10.24 _puts_r
0.11 4621.12 5.12 _write_r
0.06 4623.68 2.56 __reset
0.06 4626.24 2.56 __uart_init
0.06 4628.80 2.56 puts


With Franck's+mine patch (sampling npc while running) I get this:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls Ts/call Ts/call name
53.03 2107.07 2107.07 __uart_putc
25.38 3115.71 1008.64 loop_a
14.89 3707.21 591.50 loop_b
1.55 3768.65 61.44 __sfvwrite_r
1.29 3819.85 51.20 _write
0.90 3855.69 35.84 memmove
0.64 3881.29 25.60 __sflush_r
0.52 3901.77 20.48 strlen
0.45 3919.69 17.92 _write_r
0.32 3932.49 12.80 memchr
0.26 3942.73 10.24 __swrite
0.26 3952.97 10.24 _fflush_r
0.26 3963.21 10.24 _puts_r
0.06 3965.77 2.56 __uart_init
0.06 3968.33 2.56 main
0.06 3970.89 2.56 memcpy
0.06 3973.45 2.56 puts

A note on the 'sampling npc while running' patch, it still needs some
fine tuning, since it quits after 10000 samples and it goes pretty
fast to collect that when there is no delays (i.e. it will not run for
300 s).

Stefan

Loading...