Menu
Logged-In As
ACCOUNTNot Logged In
Investigate performance of setting thread affinityBRL-CAD
Status: ClosedTime to complete:
72 hrs
Mentors: Sean
BRL-CAD's raytrace library (LIBRT) is pervasively multithreaded using routines defined in our basic utility library (LIBBU) for detecting an using multiple CPUs/cores/threads. On Linux, BSD, or Mac OS X, you can set the affinity of a process to stay on a processor. This task involves making minor modifications to the LIBBU parallel interface using sched_setaffinity and/or pthread_attr_setaffinity_np (or similar affinity mechanism depending on the platform) and then evaluating the performance impact using our BRL-CAD Benchmark suite ('benchmark' command). Code:
|
Uploaded Work
File name/URL | File size | Date submitted | |
---|---|---|---|
parallel.patch | 2.4 KB | December 09 2012 18:32 UTC | |
aff.tar.gz | 5.0 KB | December 10 2012 13:07 UTC | |
new.patch | 2.8 KB | December 10 2012 16:16 UTC | |
aaf_new.tar.gz | 4.5 KB | December 10 2012 16:31 UTC | |
aff_latest.tar.gz | 5.0 KB | December 10 2012 16:43 UTC | |
aaf_latest.tar.gz | 4.7 KB | December 10 2012 17:06 UTC | |
newp.patch | 2.9 KB | December 11 2012 11:35 UTC | |
fixed.patch | 2.8 KB | December 11 2012 16:28 UTC | |
per3.patch | 3.8 KB | December 12 2012 18:45 UTC | |
per2.patch | 3.8 KB | December 12 2012 18:45 UTC | |
per3.patch | 3.9 KB | December 12 2012 18:49 UTC | |
per2.patch | 3.9 KB | December 12 2012 18:49 UTC | |
per3.patch | 3.9 KB | December 12 2012 18:51 UTC | |
per2.patch | 3.9 KB | December 12 2012 18:51 UTC |
I would like to work on this task.
This task has been assigned to Skriptkid. You have 72 hours to complete this task, good luck!
I'm writing a funtion which does the job of setting the affinity mask. But where is this function called. And where am I supposed to write this? In parallel.c or semaphore.c?
The work on this task is ready to be reviewed.
the flag '-lpthread' has to be passed to gcc. Otherwise, this won't compile.
I.e. we need 2 outputs: benchmark with and without your changes to parallel.c
One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.
Where should the function for setting the affinity be called in parallel.c? At the beginning? At the end?
The work on this task is ready to be reviewed.
However ther are some shortfalls
One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.
I didnt understand the first part. If i don't include pthread.h, it won't compile.
The work on this task is ready to be reviewed.
Now the function has been moved to parallel.c. they are not in worker.c or view.c. Just one function prototype in parallel.c
One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.
aaf_new.tar.gz is the latest. The specified fixes have been done. Also, a small change to the note. The function prototype is in worker.c and view.c no more, but in parallel.c. It is only called in worker.c and view.c
The work on this task is ready to be reviewed.
aaf_latest.tar.gz is the latest. **** DO NOT USE ANYTHING SUBMITTED BEFORE THAT. IGNORE ALL OTHER FILES **** Sorry for any confusion.
Removed it, but I doubt it will compile without that statement. It doesn't on my computer. Shall I include it in the #ifdef HAVE_PTHREAD_H or #ifdef linux condition block?
Skriptkid, the indentation and style is STILL wrong... :) Please review our HACKING file for details, but open curly braces go on the same line as their statements (except for functions) and sequences of 8 spaces need to be tabs (we use tab+space to indent levels are 4 spaces, 1 tab, 1tab+4spaces, 2 tabs, 2tabs+4spaces, etc). Getting the style right isn't optional.
The implementation seems reasonable, but the testing seems a bit flawed. Setting affinity within a VM isn't likely to make a difference (and the benchmark numbers reflect that). Moreover, the patch only includes changes to parallel.c and not any other files you changed. Run "svn diff" from the top src directory to get a patch for all changes.
One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.
The deadline of the task has been extended with 1 days and 0 hours.
Could you tell me if the patch is alright now? ('newp.patch') I'll run a new benchmark and submit that too.(Because I'm downloading brlcad source again. Can't port it from VM.) Also, is it okay if I test it in a Mac? Using make bechmark, of coure.
The work on this task is ready to be reviewed.
I refer here to the lines in your "newp.patch" file. Open it with a text editor and you'll see what I mean:
One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.
The work on this task is ready to be reviewed.
Fixed all that you said. I'm getting a 'malformed patch.' Tell me if you're getting it too. I'll look into that.
The patch applied fine for me but there are several compilation problems. How are you compiling? If you've disabled strict compilation, you need to re-enable that and fix the issues reported. There are several issues, so perhaps try compilation on Mac as well as that should uncover most of them (Mac does not even have the pthread affinity functions() so you'll have to wrap the implementation in #ifdef protections and probably add a function test to the top-level CMakeLists.txt file.
One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.
The deadline of the task has been extended with 1 days and 0 hours.
So Skriptkid, a little encouragement perhaps. I applied your patch and did a real quick-fix for the various build issues I alluded to (which you should still fix properly). I then tested the performance impact on a real compute system.
On a 64-processor POWER7 beast, I'm seeing our *debug* benchmark results going from around 44000 VGRs to around 67000 VGRs. That's more than a 50% performance increase, but wait, it gets better...
Our *optimized* production benchmark results increase from around 121000 VGRs to 185000 VGRs!
That. Is. Awesome.
Your patch is going to make some big attention in our next major release. I'll also be adding extra tasks to implement thread affinity for BSD and Windows.
Amazing to hear that! :D Look at per3.patch and per2.patch. I've made some fixes. I've added #if defined()s. So only if pthread support and gnu source are found, that function will be run, else, it won't even have a prototype.(Tell me if there are styke coding issues with the preprocessor commands, I saw the AHCKING file, but fonudn nothing about them. I had to see other source files and make deductions.) Also, I'm really sorry for the late progress. I have my semester exams.
Difference between the two patches is that per2.patch contains some unecessary addition and deletion(of the same) lines. I know that has to be fixed, but just use this to test the compilation. per3.patch has all those issues addressed.(But maybe malformed.) And the changes made compile even in strict mode. Checked it. probably, no '#define _GNU_SOURCE' was causing compilation failure(I was asked to remove it, but now I've put it an #if statement.)
Use the last two per2.patch and per3.patch. Not the first two.
The work on this task is ready to be reviewed.
You left a stray #define GNU_SOURCE in there, but it otherwise looks okay. The preprocessor work needs some work (the API function should always exist) and it's probably worthwhile to just hook it in behind all bu_parallel() calls, but that change can be made on commit. Thanks for your efforts!
If you have access to other platforms, there are two new affinity tasks to implement support for Windows and BSD now because of your success.
Congratulations, this task has been completed successfully.
Thank you :) I have access to Windows, so yes, I'll take it up. But I'm unable to find it on the tasks list. Is it still on the way?
Please inform me ASAP. I really wanna take that up.
Joining IRC, asking a question, and leaving is like walking into a room, farting, and leaving! :)
As you can imagine, that's not very polite. :) IRC isn't just for you to use when it's convenient to get a quick response. If you're going to join, you'll need to wait for a response. Here or our mailing list are the places to ask without waiting. I mention this because you've done the ask-and-bolt several times now.
The other tasks are not yet posted because this patch has to be integrated first. The follow-up tasks will need to build on that committed code so we don't have integration work. If you want to clean up this patch with the things I mentioned, that would certainly speed things along for posting the other two affinity tasks:
Basically, just a little bit of cleanup in that file. If you start working on this, let me know and I can create a task for it, but you'll need to be meticulous and careful since the goal is proper cleanup. :)
I'm really sorry about that. Won't happen again. Also, I'll start the cleanup. Where do I upload the patches since this is closed?
If you think you're up to it, http://www.google-melange.com/gci/task/view/google/gci2012/8051203
As most of those changes involve cleaning up parallel.c which is a critical piece of our code, don't hesitate to discuss your progress and ask questions. We don't want to have to iterate a dozen times through the GCI interface (especially on whitespace/style issues!) if we don't have to.
Where do I call bu_set_affinity in bu_parallel? At the beginning of the function?
You'll need to read through and try to understand what bu_parallel() does, at least for the pthreads case. Pay particular attention to the pthread_create() function and what every parameter means (man pthread_create).