Integrate new thread affinity interface into LIBBU and LIBRTBRL-CAD
Status: ClosedTime to complete: 72 hrs Mentors: SeanTags: multithreading, SMP, integration, cleanup

Work was completed recently that successfully demonstrated the value of implementing threading affinity:

  • http://www.google-melange.com/gci/task/view/google/gci2012/7960222

This task involves cleaning up that work and polishing it into production-quality code.  That nominally involves the following:

  • hooking behind bu_parallel(), not the individual worker functions
  • unwrapping the new function so it's always available
  • documenting the function properly (missed that mistake before closing the task)
  • pulling the static globals into the callback in parallel.c
  • creating a separate callback that takes an arg
  • make the one with static data call the new arg one
  • updating the callback to utilize the arg parameter, pass it during pthread_create()
  • update the other platforms to do the same where we can, and finally
  • sorting out what to do with _GNU_SOURCE (since just testing means it'll never be set)

Submit your work as a single patch file.  Note this task requires meticulous attention to detail.

Uploaded Work
File name/URLFile sizeDate submitted
affi.patch2.3 KBDecember 16 2012 14:39 UTC
po.patch4.5 KBDecember 17 2012 16:55 UTC
po2.patch1.7 KBDecember 18 2012 12:18 UTC
po3.patch1.3 KBDecember 18 2012 16:13 UTC
po4.patch4.2 KBDecember 18 2012 19:04 UTC
po5.patch4.6 KBDecember 19 2012 11:42 UTC
po6.patch4.5 KBDecember 19 2012 11:44 UTC
po7.patch4.6 KBDecember 19 2012 11:49 UTC
po8.patch4.6 KBDecember 19 2012 15:56 UTC
po9.patch4.4 KBDecember 19 2012 17:56 UTC
po10.patch4.7 KBDecember 19 2012 18:32 UTC
parallel.tar.gz2.9 KBDecember 19 2012 18:32 UTC
po11.patch8.5 KBDecember 20 2012 12:21 UTC
po12.patch8.8 KBDecember 20 2012 12:25 UTC
po13.patch8.8 KBDecember 20 2012 17:13 UTC
po13.patch8.9 KBDecember 20 2012 17:15 UTC
po14.patch8.8 KBDecember 20 2012 18:11 UTC
po15.patch9.2 KBDecember 20 2012 18:58 UTC
Comments
Skriptkidon December 14 2012 16:40 UTCTask Claimed

I would like to work on this task.

Andrei Popescu on December 14 2012 16:50 UTCTask Assigned

This task has been assigned to Skriptkid. You have 72 hours to complete this task, good luck!

Skriptkidon December 14 2012 17:04 UTCDoubts

1. Where in bu_parallel should bu_set_affinity be called?


2. By unwrapping the function, do you mean I need to ake out the preprocessor conditional statment and make it available irrespective of the availability of pthread. and GNU_SOURCE extensions?


3. I've done the documentation as good as I can, but please do tell what is lacking.


4. Understood nothing in points 4-8 :) I know what staic global variables are and what callbacks are, but need more explanation on those points.


5. If GNU_SOURCE is not used, compilation cannot be used. pthread_setaffinity_np() and pthread_getaffinity_np() needs this to be defined as they use some GNU extensions.

Sean on December 14 2012 17:58 UTCyou may need to take this one step at a time

So know that you may be a little out of your depth taking this task on which is why I was going to do it instead of making it a task anyways, but you're welcome to try.  Just be aware that you're going to need to do some research and, most importantly, just take things one step at a time.


#1: To answer your first question, you need to have a basic understanding of what bu_parallel() does.  Read through that function from top to bottom.  Pay particular attention to the pthreading section but notice how it's similar to several of the other sections.  Notice the pthread_create() function.  Read the man page for that function and understand every parameter being passed and what they mean because you're going to later need to change that (but not just yet).  That should be enough to answer the first question.


#2: Yes and no.  The function must be available.  Doesn't mean that it has to do anything useful, but it must exist.  It also must still compile on platforms where that function is not available too.


#3: It's in the wrong place.


 


Start there and some of the 4-8 points might begin to make more sense.

Skriptkidon December 15 2012 12:17 UTCPthread

pthread_set/get_affinity() are POSIX threads library functions. So I'm guessing they need to be implemented in the section where pthread_create is being used(in #if defined(HAVE_PTHREAD_H) block) and not before it in the SunOS/Solaris section, where thr_create is being used. Am I right?

Skriptkidon December 15 2012 12:24 UTCRead up on pthread_create

I read up on pthread_create and it's parameters and have gone through parallel.c. Must admit, most of it seemed gibberish to me(stacks, concurrency, semaphore, etc), since I've never really dealt with so much low-level programming. But if I've understood right, bu_parallel is a function for creating a thread, invokes parallel_interface and manage the thread.(Differently on different platforms.) Am I right?

Sean on December 16 2012 05:44 UTCon the right track

You're on the right track.  So for pthreads (which are in the POSIX threads library, but are a GNU extension), it ends up creating a new thread that invokes parallel_interface().  That's the key.  You need to set the affinity within the thread, and all threads invoke parallel_interface().


 

Skriptkidon December 16 2012 06:05 UTCCall set_affinity in parallel_interface?

Points 4-8 are starting to make sense now, as you said. Since all threads invoke parallel_interface, I'm assuming parallel_interface needs to call bu_set_affinity. Are you saying(in points 4-8) that parallel_interface, which currently takes no arguments, needs to take callback to a function as an argument and bu_set_affinity has to be passed to parallel_interface as a callback in pthread_create?

Skriptkidon December 16 2012 13:32 UTCOr...

I just saw the section "un-published globals." You mean I need to make it a call back there?

Skriptkidon December 16 2012 14:39 UTCReady for review

The work on this task is ready to be reviewed.

Skriptkidon December 16 2012 14:41 UTCProgress

I know it's not complete, but posted to check if I've made any progress in figuring it out.

Sean on December 17 2012 04:32 UTCyou assume correctly

You assume correctly about calling from within parallel_interface.  You have the right idea about taking advantage of the ability to pass an argument.  What it amounts to is making it so that parallel_func and parallel_arg are not used for platforms like pthread that can take an arg.  You'll probably have to create a struct containing the callback, the arg, and possibly the cpu ID.


 


 

Sean on December 17 2012 04:32 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Sean on December 17 2012 04:39 UTCpatch

As for your patch, put your new affinity function into a new file (see sh/template.sh).  You added the bu_set_affinity() function in the right function, but it's not right.

Sean on December 17 2012 04:39 UTCDeadline extended

The deadline of the task has been extended with 2 days and 0 hours.

Skriptkidon December 17 2012 12:17 UTCDid not understand.

1. Should I modify parallel.c to something similar to this:


 


-------------------------------------------------------------------------------------------------------------------


....


....


struct something


{


    int (*func_ptr)(void);


    data_type arg;


    data_type cpuid;


}


....


struct something some_obj;


some_obj.func_ptr = bu_set_affinity;


.....


pthread_create(..., ..., parallel_interface, some_obj)


......


......


---------------------------------------------------------------------------------------------------------------------


 


2. And then modify parallel_func(void) to:



----------------------------------------------------------------------------------------------------------------------


....


....


parallel_func(struct something some_obj)


{


    ....


    some_obj.(*func_ptr)(void);


    ....


}


....


....


-----------------------------------------------------------------------------------------------------------------------


(Don't worry about syntax or styling. I just need to get the idea.)


 


If not, what should I do? I not able to understand your instructions.(My bad, not blaming you.)

Skriptkidon December 17 2012 12:27 UTCbu_set_affinity

If I put it in a different file, how can I make the other source files access it? Something to do with Cmake? Just tell me where I need to start looking for how to do this.(What I need to Google for, what files I need to look at, that sort of stuff.) I can carry on from there.

Daniel Rossberg on December 17 2012 12:36 UTCAdd it to the file list in CMakeLists.txt

and add the function prototype to an appropriate header file.

Skriptkidon December 17 2012 15:56 UTCThanks :)

Would you know anything about the question above it also?

Sean on December 17 2012 16:08 UTCapparently my reply was eaten

You're on the right track with the struct, but not how you're using it.  Right now, bu_parallel() sets the two static global variables, calls parallel_interface() (via pthread_create()), that calls the user's callback function (that was set in the static global variable).


You're going to change that so that the static global variables are not used.  To do that, bu_parallel() will instead use your struct, set the *users* function in there, pass that to parallel_interface(), and it'll get called.


 

Daniel Rossberg on December 17 2012 16:12 UTCI'm not sure

but shouldn't you set the affinity before doing the work (parallel_func)?

Skriptkidon December 17 2012 16:14 UTCGot it

But should the function type in the struct also be of void type? If so, currently bu_set_affinity is an int. Should I make t void and use function bu_log(which I've so far seen in many places to log errors) to log any errors?

Skriptkidon December 17 2012 16:52 UTCRough draft

As per what you've told, this is what I've done. Yes, I need to pu bu_set_affinity in a different file and yes, #define _GNU_SOURCE needs something to be done, but I just want to know if I've done the implememtaion right on the parallel.c side of things.

Skriptkidon December 17 2012 16:55 UTCReady for review

The work on this task is ready to be reviewed.

Sean on December 17 2012 17:37 UTCon the right track

It looks like you're on the right track though a couple comments and questions are raised.  First comment, you'll still going to need to have a version of the parallel_interface callback that takes no parameters because there are a lot of other platform implementations that will get broken if it doesn't exist.  You should make the one that takes no arguments call your modified parallel_interface.


The other question is what cpu_id was set to before getting to the callback function.  If that value can be set in bu_parallel, even better so as to avoid using global.


 

Sean on December 17 2012 17:37 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Skriptkidon December 18 2012 11:23 UTCBut...

If I create a new modified parallel_interface_arg that takes arg and call it in parallel_interface, the struct will have to be passed to parallel_interface_arg by parallel_interface. If parallel_interface has to pass the struct to parallel_interface_arg, parallel_interface needs to take a struct as an argument, since it needs to know what struct it has to pass to parallel_interface_arg. But, since parallel_interface needs to take no arguments, the struct has to be made global, and the whole point is to avoid usage of globals, isn't it?

Skriptkidon December 18 2012 12:20 UTCpo2.patch

That'll give make what I was saying clearer. Also, where is bu_set_affinity implemented in all this? the new struct is for the user's func(passed to bu_parallel) to be called in parallel_interface. But how and where is bu_set_affinity called?

Skriptkidon December 18 2012 12:28 UTCReady for review

The work on this task is ready to be reviewed.

Skriptkidon December 18 2012 12:38 UTCcpuid

Before implementing the structure, cpuid ID was set to parallel_nthreads_started. Now, since parallel-nthreads_started itself is used in bu_parallel, and a local variable's value isn't assigned to it in bu_parallel(like parallel_func was done), removing this global means the 'cpuid' object of the struct thread_data has to be used instead throught the fuction. Will this be alright?

Sean on December 18 2012 14:24 UTCnot so

Responding to "But...": parallel_interface() will need to pass the struct to parallel_interface_arg() but that doesn't mean it needs to take the struct as an argument.  It's still the arg-less function used by old platforms and will still have values set in the globals.  It can fill in the struct just like how bu_parallel() can fill in the struct.  It certainly doesn't need to be global in that context.


I'm not sure I understand your cpuid question as stated.  The general idea is that bu_parallel() knows how many threads it has created and can pass that ID value to the thread instead of the thread pulling it from a static global that all threads acquire a lock around and share.


Stated earlier, the new affinity function should be in its own new file: src/libbu/affinity.c


 

Sean on December 18 2012 14:25 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Skriptkidon December 18 2012 15:54 UTCGlobals

Currently, parallel.c has two global variable parallel_func and parallel_arg. These two are used by parallel_func. Parallel_interface uses the pointer parallel_func, which is the pointer to the user's function, to call the user function. Are you saying I needn't modify any of this, except that instead of parallel_interface directly calling parallel_func, it needs to fill in the struct using the globals parallel_func and parallel_arg and then pass these to parallel_func_arg, which will read the passed struct and thus calls  the user's function? 

Skriptkidon December 18 2012 16:13 UTCAffinity.c

This part is done. I've put it in a seperate file, added the declaration in bu.h. I'll submit those also once this part is cleared. Can you tell me what I need to do with respect to the new patch I've submitted?(Sorry, I'm unable to understand what the new mechanism needs to be.)

Skriptkidon December 18 2012 16:13 UTCReady for review

The work on this task is ready to be reviewed.

Sean on December 18 2012 16:43 UTCthat's part of it

So that looks like part of it.  You still need to modify bu_parallel() to call parallel_interface_arg() where possible instead of parallel_interface().  There are at least two or three platforms where you can pass a pointer to a thread_data struct that you filled in.  For those platforms, they shouldn't need to use the globals.

Sean on December 18 2012 16:43 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Skriptkidon December 18 2012 18:56 UTCfromParallelInt

had to introduce that variable in the struct to avoid additional unwanted incrementations and certain function calls I thought were necessary(looking at parallel_interface.) parallel_interface_arg is called directly by 3 platforms - SunOS(thr_create), POSIX threads(pthread_create) and Windows(CreateThread). Tell what else needs to be done. Also, there might be a slight styling issue in the second-level preprocessor in parallel_arg. Didn't know how that needs to done(HACKING doesn't say much about preprocessors.)

Skriptkidon December 18 2012 19:04 UTCReady for review

The work on this task is ready to be reviewed.

Sean on December 18 2012 20:01 UTCdon't see the point

I don't see the point of fromParallelInt.  Why can't cpu_id just get set within bu_parallel()?  Avoiding those semaphore locks in parallel_interface_arg() was an intentional goal. 


If it's a platform that uses parallel_interface(), the static globals still get used and the counter incremented as before.  If it's a new platform using parallel_interface(), no static variable needs to be touched and it doesn't matter if they go unused.


Since the SGI case is one of the parallel_interface() args, that logic shouldn't also be in _arg().


Basically, everywhere you've replicated a section of code, it's not right. :)


 

Sean on December 18 2012 20:01 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Skriptkidon December 19 2012 11:42 UTCReady for review

The work on this task is ready to be reviewed.

Skriptkidon December 19 2012 11:47 UTCFixed

Okay, that's fixed. For all the platforms that call parallel_interface_arg directly, the cpu_id is assigned a value in bu_parallel_interface itself. The globals aren't touched. Now, there's a single structure now in bu_parallel, rather than different structures for different platforms, as I had done previously(which I felt is unecessary.) Also, found some tiny mistake in the error statment in POSIX section. Error message said 'thr_create failed ......' I made that 'pthread_create failed ..........'

Sean on December 19 2012 15:26 UTCnicely done

That's really looking great Akshay!  Looks like this is wrapping up nicely.  I only have one remainging question for you that I don't know the answer to myself.


So bu_parallel() has a user_thread_data_bu struct that is getting passed to thr_create(), pthread_create(), and CreateThread().  You reuse that struct for every thread, but does each thread retain its own copy?


Since it's the same struct that we're passing a pointer for and later incrementing that same cpu_id in the thread creation loops, is the cpu_id being incremented for all threads or is copied.  A quick review of the docs for each of those three functions should confirm how that is handled.  You may need to create N structs so each one is independent.


Note that you have a typo on two of the user_thread_data.cpu_id++'s where it's missing the _bu.


 

Sean on December 19 2012 15:26 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Sean on December 19 2012 15:27 UTCDeadline extended

The deadline of the task has been extended with 1 days and 0 hours.

Skriptkidon December 19 2012 15:55 UTCMaybe not

Each thread retaining it's own copy of the struct might not be needed. This is pass by reference, agreed, but since it isn't being modified by parallel_interface or parallel_interface_arg, the only places where it is being accessed, I don't think that will be necessary. The struct is required only to tell the thread which function has to be called, what the arg is and what the thread ID is, so N structs would be a waste of resources. Think of the struct as being only read and not written to by the threads. Only data we set is read and no modification is done.


 


If there's any flaw in this theory of mine or if you'd still prefer each thread having it's own copy, please tell me, and I'll do it :)

Skriptkidon December 19 2012 15:56 UTCReady for review

The work on this task is ready to be reviewed.

Sean on December 19 2012 16:56 UTCBut the parent thread is modifying it

Even if they're read-only, the parent thread modifies the value and uses the cpu_id to set the affinity (among other things).  If it's not a per-thread instance, then there indeed is a problem since the scheduler could pause a thread after creation (context switch, cpu load, lots of threads, whatever) yet the main loop continues to increment cpu_id and create more threads.  It'd be technically possible for all threads to read cpu_id as the max value, which would be bad (with affinity, it's reduce computation to a single processor).


 

Sean on December 19 2012 16:59 UTCa quick search

A quick search on pthread_create() and it indeed is shared data, so this is a bug.  There needs to be a separate structure for each thread being created.  Don't forget to deallocate since you'll probably want to allocate on the heap. :) 

Sean on December 19 2012 17:00 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Skriptkidon December 19 2012 17:55 UTCModified

Now the structure is modified to structure array with 'ncpu' elements, wheer each element is for each thread.(Since ncpu threads are created.) Also, the cpu_id is initialised in a loop, through 0 to (ncpu-1), along with the other members, rather than increment while thread creation. So cpu_id is now fixed for all threads.

Skriptkidon December 19 2012 17:56 UTCReady for review

The work on this task is ready to be reviewed.

Skriptkidon December 19 2012 18:34 UTCparallel.tar.gz

po10.patch is the patch only for parallel.c. Contains the struct fix. parallel.tar.gz has affinity.c(the new one I created) and a patch which adds declaration of bu_set_affinity to bu.h and the file name to CMakeLists.txt.

Sean on December 19 2012 20:01 UTCa single patch file

You can create a single file just by running svn add src/libbu/affinity.c before running svn diff.  That will include it in the patch file and is easier to review/apply than the tarball.


The patch looks good except for the [ncpu] array.  It's not portable to use variadic static allocation and our compilation settings will trip on that as a failure for several deployment environments.  It's valid c99, but not all compilers are complaint and not a standard we can even require due to Windows lacking support for other c99 features.  It'll need to be on the heap.


 

Sean on December 19 2012 20:01 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Skriptkidon December 20 2012 01:57 UTCHeap

Could you explain what the 'heap' is and how I can use that to set the Struct?

Sean on December 20 2012 03:35 UTCheap

heap allocation == dynamic allocation


i.e., malloc/free (or in this case bu_calloc/bu_free)


 

Skriptkidon December 20 2012 12:21 UTCReady for review

The work on this task is ready to be reviewed.

Skriptkidon December 20 2012 12:27 UTCHeap

made the user_thread_data_bu a pointer now. bu_calloc'ed it to hold 'ncpu' objects. It's bu_free'ed at the end.

Daniel Rossberg on December 20 2012 13:09 UTCWe can't accept your copyright

You have to assign it to the U.S. Government.  See HACKING and sh/template.sh.

Daniel Rossberg on December 20 2012 13:09 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Daniel Rossberg on December 20 2012 13:11 UTCDeadline extended

The deadline of the task has been extended with 1 days and 0 hours.

Skriptkidon December 20 2012 17:10 UTCHehe

Sorry. Had put my name in template.sh. Will change it.

Skriptkidon December 20 2012 17:16 UTCChanged

Changed the copyright. Also, bu_set_affinity has been implemented in parallel_interface and parallel_interface_arg(Or was this not supposed to be done?)

Skriptkidon December 20 2012 17:17 UTCChanged

Changed the copyright. Also, bu_set_affinity has been implemented in parallel_interface and parallel_interface_arg(Or was this not supposed to be done?) See the second 'po13.patch'

Skriptkidon December 20 2012 17:17 UTCReady for review

The work on this task is ready to be reviewed.

Skriptkidon December 20 2012 18:13 UTCRemoved redefiniton

in po14.patch, Removed the block


#ifdef _GNU_SOURCE


#  define _GNU_SOURCE


#endif


 


where _GNU_SOURCE was being redefined, as pointed out by d_rossberg. This was due to my mistake in understanding the logic of #ifdef.

Daniel Rossberg on December 20 2012 18:20 UTCCompilation errors

/home/rossberg/Devel/BRL-CAD/brlcad/src/libbu/parallel.c: In function ‘parallel_interface’:


/home/rossberg/Devel/BRL-CAD/brlcad/src/libbu/parallel.c:595:5: error: ISO C90 forbids mixed declarations and code [-Werror=edantic]


/home/rossberg/Devel/BRL-CAD/brlcad/src/libbu/parallel.c:604:5: error: implicit declaration of function ‘parallel_interface_arg’ [-Werror=implicit-function-declaration]


/home/rossberg/Devel/BRL-CAD/brlcad/src/libbu/parallel.c: At top level:


/home/rossberg/Devel/BRL-CAD/brlcad/src/libbu/parallel.c:622:1: error: conflicting types for ‘parallel_interface_arg’ [-Werror]


/home/rossberg/Devel/BRL-CAD/brlcad/src/libbu/parallel.c:604:5: note: previous implicit declaration of ‘parallel_interface_arg’ was here


/home/rossberg/Devel/BRL-CAD/brlcad/src/libbu/parallel.c: In function ‘bu_parallel’:


/home/rossberg/Devel/BRL-CAD/brlcad/src/libbu/parallel.c:711:5: error: ISO C90 forbids mixed declarations and code [-Werror=edantic]


cc1: all warnings being treated as errors

Daniel Rossberg on December 20 2012 18:20 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Skriptkidon December 20 2012 18:51 UTCReady for review

The work on this task is ready to be reviewed.

Skriptkidon December 21 2012 15:19 UTCCompilation Errors

Compilation errors fixed

Melange on December 21 2012 17:03 UTCNo more Work can be submitted

Melange has detected that the deadline has passed and no more work can be submitted. The submitted work should be reviewed.

Sean on December 22 2012 13:20 UTCTask Closed

Congratulations, this task has been completed successfully.