Editing User:Marco-domingues/GSoC17/Log


Warning: You are not logged in. Your IP address will be publicly visible if you make any edits. If you log in or create an account, your edits will be attributed to your username, along with other benefits.

The edit can be undone. Please check the comparison below to verify that this is what you want to do, and then save the changes below to finish undoing the edit.
Latest revision Your text
Line 141: Line 141:
* Working on an algorithm to create the list with all the boolean regions involved in the partition. This requires iterating over all the segments of the partition and over all of the regions, which may not be the best solution yet.
* Working on an algorithm to create the list with all the boolean regions involved in the partition. This requires iterating over all the segments of the partition and over all of the regions, which may not be the best solution yet.
=== 4-5 July ===
* Created a regiontable bitvector to hold the boolean regions involved in each partition. The regiontable is necessary to evaluate the partitions only against all the regions involved, and to resolve overlapping partitions.
=== 6 July ===
* Found and fixed a bug in the insertion of partitions that would generate partitions with wrong 'back_pp/forw_pp' pointers. Now is possible to iterate over the partitions list in a sequential manner in the 'eval_partitions' kernel, and test the partitions against additional conditions before evaluation.
=== 7 July ===
* Removed the pointers from the bool_region structure. Now, I created an OpenCl buffer that contains boolean trees from all the regions, and I use other buffer to know the offset for each region.
* Worked on the shading of evaluated partitions. Before, I was wrongly shading segments from the evaluated partitions, but now I shade with the information of the 'inhit' and the results seem to match with the ansi c results. There are some slightly differences in the illumination, but those differences were already there before.
=== 10 July ===
* Debugged the code trying to identify the cause for the wrong results when evaluating partitions in the share/db/operators.g scene, since the regiontable seems to be correctly built. By comparing the output produced by the ansi c version and the OpenCl version, could identify several occurrences of segments in partitions with wrong 'seg_sti', which would explain the wrong results when evaluating partitions. Despite the wrong 'seg_sti', the segments of the partitions have the correct values (hit_dist values of the segments match the values of the ansi c partitions).
=== 11 July ===
* After further investigation on the 'seg_sti' issue, it appears that the boolean trees have the bits of the ordered primitives, and the 'seg_sti' calculated by the 'store_segs' kernel are the same as the 'rtip->rti_Solids->st_bit'. To confirm this, I translated the bits of the boolean trees to the 'rtip->rti_Solids' bits before sending the boolean trees to OpenCL, and was able to get the following results:
* Hopefully this help us find the cause for the 'seg_sti' mismatch between the segments and the bits in the boolean tree.
=== 12 - 14 July ===
* Cleaning and refactoring the code in order to submit the patch against the opencl branch in svn.
* While testing the code against some more complex scenes, found and fixed the bug that was causing some of those scenes to take more time than expected to render.
* Found and fixed the bug that was causing some 'holes' in some views. Here is an example of a view that was having this problem, now fixed:
* Will finish the rt_boolfinal kernel, by implementing the handler for overlapping partitions before submiting the patch in the svn.
=== 17 July ===
* Implemented an overlap handler to resolve the overlapping of partitions
* Completed the rt_boolfinal kernel. There are still some bugs with more complex scenes, that I believe are caused by some partitions reporting overlap when they shouldn't (by comparison with the ansi c code), but still have to investigate.
* Cleaned and refactored the code and submitted the patch in the svn: https://sourceforge.net/p/brlcad/patches/472/ (This was created against the opencl code branch)
* Planning to optimize the code next, starting by implementing a list for the regiontable, instead of using a dynamic bitvector.
=== 18 July ===
* Debugged the code and found the cause for weird results when drawing geometry with the command 'e *'. It comes from a bug in the overlap handler, that might be hard to resolve using bitarrays.
* Working on a solution that uses an array to represent the regiontable, instead of a bitarray
=== 19 July ===
* Adjacent partitions should no longer write to the same memory positions of the bitarray. This was causing CSG scenes with more than 32 regions and/or more than 32 segments per ray to produce wrong results.
* Optimized the code to iterate over bitarrays, now iterating only over set bits in the bitarrays.
* Working on a fix to the overlap handler problem
=== 20-21 July ===
* Resolved the weird results when drawing geometry with the command "e *"
* Changed the code to use only 1 regiontable per ray, instead of 1 regiontable per partition, which reduces the necessary memory.
* Prepared the code and submitted the patch in the svn (https://sourceforge.net/p/brlcad/patches/472/)
=== 24 July ===
* Made some changes on the 'shade_segs' kernel to replicate the "Surface normals" lighting model. It looks like there is an issue with the interior normals, but I am not sure if this comes from the known issue of the primitives having the ids in a different order or not.
* Collected some time measurements over some scenes in the 'share/db' directory. Wasn't quite lucky with the profiling tools, so ended doing this with the elapsed time given from the ray tracing command.
* Tomorrow will add some more measurements and will share the document with the results.
=== 25 July ===
* Did some further testing over some scenes in 'share/db' to gather more statistics to optimize the code
=== 26 July ===
* Fixed the issue that was causing some interior normals to be wrongly represented
* Working on a solution to iterate only over partitions evaluated in the shading process
=== 27 July ===
* Optimized the process of building the regiontable by precomputing a table with all the regions involved in each primitive instead of doing this for each partition in the rt_boolfinal kernel.
* This change on the code resulted in a significant improvement on the performance, as you can see in the following table:
(Running the OpenCL over the NVIDIA OpenCL SDK on GPU - Debug build)
=== 31 July ===
* Changed the code to skip unevaluated partitions during the shading process. This change doesn't appear to have much impact over the performance of the code (2.58sec before vs 2.52sec now) for the havoc.g scene.
* The major bottleneck on the code right now seems to be the bool_eval() function. Disabling the function results in rendering the havoc scene in only 0.35sec (vs 2.52sec function enabled), for what it's worth.
=== 1 August ===
* Found and fixed the bug that was causing some scenes to shade the wrong partitions in the ray. The closest partition in the ray is being shaded now.
* Example of what was happening before vs what is happening now:
=== 2 - 4 August ===
* Working on a linearized binary tree representation to perform bool_eval() in OpenCL in a similar way of the current code in the trunk, instead of using the boolean tree in RPN.
=== 6 August ===
* Changing the bool_eval() and other auxiliary functions in order to use the new tree structure during boolean evaluation.
=== 7 August ===
* Finished the implementation of bool_eval() using the new tree representation. The new code seems to be slightly slower than the version on the trunk for some scenes, but still considerable faster than the previous method that uses the RPN tree. Thus, porting the new code to OpenCL should improve the performance of the code, which currently uses the RPN tree method.
* Here is a table with a time comparison between the code in the trunk, the code with the new tree structure and the code currently in the opencl branch (RPN tree):
(Release build)
* Removed the 'next_evalpp' from the partition structure, which wasn't necessary in the first place
=== 8 August ===
* Made some optimizations in the ANSI C bool_eval() prototype and now the new code is slightly faster than the code currently in the trunk! This difference is more noticeable when rendering the havoc scene with the command "rt -s2048": 2.53sec vs 2.18sec.
(Release build)
=== 9 August ===
* Manually merged the new code over the opencl branch code.
* Changed the function 'rt_pr_bit_tree()', used to debug the new tree structure, so the output matched the output from the union tree debug function. (before the function was printing newlines out of place)
* Updated some structures and some host functions to create the new OpenCL buffer with the new tree representation.
* Planning to finish the bool_eval() function of the OpenCL code tomorrow.
=== 10 August ===
* Finished the port of the new_bool() function to OpenCL. This new bool_eval() function uses a new boolean tree representation, and follows the behaviour of the current ANSI C code in the trunk.
* Here is a table with the time comparison of the previous implementation (bool_eval() with RPN tree) and the current version with the new tree representation:
(running the OpenCL over the Intel OpenCL SDK on CPU - Release build)
=== 11 August ===
* Cleaning the code to prepare for commit.
* General testing over the ANSI C boolean evaluation and the OpenCL boolean evaluation, trying to identify where the bottlenecks are located.
* Committed the new code to perform boolean evaluation in the opencl branch code (https://sourceforge.net/p/brlcad/code/70074/)
=== 14 August ===
* Discussed with my mentor, Vasco, the plan for the next weeks via skype.
* Will be cleaning the current code and prepare a patch ticket against the trunk with the new CSG boolean evaluation in OpenCL.
* Next will start changing the rendering loop of the OpenCL code to follow the behaviour of the ANCI C code, where boolean evaluation is performed in a parcial fashion.
=== 15 August ===
* Cleaning and refactoring the code
* Preparing patch against trunk code
=== 17 August ===
* Submitted patch with code to perform boolean evaluation of CSG with OpenCL (https://sourceforge.net/p/brlcad/patches/474/)
* Started working on new rendering loop, where the 'store_segs', the 'rt_boolweave' and 'rt_boolfinal' kernels will be merged into a single kernel, so the weave of segments and evaluation of partitions can take place as soon as new segments are created.
=== 18 August ===
* Merged the 'store_segs' and 'rt_boolweave' kernels into a new kernel. Still trying to avoid repeating the weave of segments already weaved in the ray.
* Planning to add the 'rt_boolfinal' kernel next to follow the behaviour of the ANSI C code.
=== 21 - 25 August ===
* Changes on 'rt_boolfinal' function in order to add this kernel into the single kernel (rt_shootray kernel: store_segs + rt_boolweave + rt_boolfinal).
* Fixed a bug that caused the ray tracing to crash with this new system, by storing the index of the head of partitions for the current ray and passing it by argument to the rt_boolweave function.
* Trying to fix the problem of some partitions being evaluated too early, that caused some pixels to shade the incorrect partitions.
* In the end, I couldn't figure out a way to weave and evaluate partitions in a partial way using the BVH (bounding volume hierarchy) because the BVH nodes aren't in spatial order. This optimization was promising and it should work if we change the code to store the nodes in a spatial subdivision structure like the kd-tree, as the Ansi C code does.
* Since we are processing all the hits before weaving segments and evaluating partitions, there is no need to keep evaluating partitions after the first opaque partition in the ray is evaluated. Because the partitions are already ordered by its in_hit point, and because all segments of the ray are processed, there is no possibility to have a partition closer to the ray origin after evaluating the first partition, so it is unnecessary and expensive to keep evaluating partitions for the ray.
* By stopping the evaluation of partitions after the first partition evaluated of the ray is found, the performance of the OpenCl code increased significantly, as we can see in the following table:
(running the OpenCL over the Intel OpenCL SDK on CPU - Release build)
=== GSoC17 is Over!! ===
* Google Summer of Code 2017 comes to an end! It was an amazing experience and I couldn't be happier with this first introduction to open source software development!
* I would like to thank the BRL-CAD community for giving me this opportunity and for always being available to help!
* A special thanks to Vasco Costa, for the great mentoring and guidance through the summer!!
* Here is the work product link that I submitted: https://github.com/MarcoSDomingues/GSoC17

Please note that all contributions to BRL-CAD may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see BRL-CAD:Copyrights for details). Do not submit copyrighted work without permission!

To edit this page, please answer the question that appears below (more info):

Cancel Editing help (opens in new window)