User:KeshaSShah/GSoC13/Priority1

From BRL-CAD

Project Title:

Code Reduction and Refactoring for Reduced Maintenance Cost.

Brief project summary

As we know, BRL-CAD is a large code base with more than a million lines of code across hundreds of binaries and dozens of libraries. Improving maintainability is an active requirement which includes identifying code duplication and refactoring accordingly. This project entails identifying common patterns of duplication throughout the code with the help of tools. Once identified, the code can be carefully re-factored into new routines, common functionality, library code, etc. Testing is required to make sure refactoring was correct and functionality was not changed.

Detailed project description

  • Most of my fellow participants are working on how important it is to keep your code in working order and adding functionalities to it, but hardly anyone ever really cares about keeping the container for your code in working order.
  • Poorly designed code usually takes more code to do the same things, often because the code quite literally does the same thing in several places. By eliminating this duplicates, we ensure that the code says everything once and only once, which is the essence of good design.
  • Mostly, when the developers are trying to get the program to work, they are not thinking about that future developer. It takes a change of rhythm to make changes that make the code easier to understand. Refactoring helps to make the code more readable. A little time spent refactoring can make the code better communicate its purpose.
  • As the code gets clearer, one can see things about the design that he could not see before because its difficult to visualize all this in head. Thus, refactoring is just wiping the dirt off a window so you can see beyond.
  • When I myself did some projects on making my own shell, implementing malloc and some data structure projects, I realized, while I’m studying code I find refactoring leads me to higher levels of understanding that otherwise I would miss out.
  • Few days ago, after downloading the source code, when I opened the first file randomly to have a glance at the code, (/src/libbn/plot3), it was of 842 lines, which in 5 minutes I removed 190 lines and added just 86 lines in return. This was just an upper glance and code reduced by 11.87% of that particular file. I checked out some other directories too and found a huge code overlapping.
  • As for example,
void
pd_move(register FILE *plotfp, double x, double y)
{
    size_t ret;
    double in[2];
    unsigned char out[2*8+1];

    if (pl_outputMode == PL_OUTPUT_MODE_BINARY) { 
	in[0] = x;
	in[1] = y;
	htond(&out[1], (unsigned char *)in, 2);

	out[0] = 'o';
	ret = fwrite(out, 1, 2*8+1, plotfp);
	if (ret != 2*8+1) {
	    perror("fwrite");
	 }
    } else {
	fprintf(plotfp, "o %g %g\n", x, y);
    }
}
void
pd_cont(register FILE *plotfp, double x, double y)
{ 
    size_t ret;
    double in[2];
    unsigned char out[2*8+1];

    if (pl_outputMode == PL_OUTPUT_MODE_BINARY) {
	in[0] = x;
	in[1] = y;
	htond(&out[1], (unsigned char *)in, 2);

 	out[0] = 'q';
	ret = fwrite(out, 1, 2*8+1, plotfp);
	if (ret != 2*8+1) {
	    perror("fwrite");
	}
    } else {
 	fprintf(plotfp, "q %g %g\n", x, y);
    }
} 
  • Here, we notice that pd_move() and pd_cont() have the same lines of codes. They are just merely copy-paste of eachother with just a letter change of out[0] and that character in fprintf(); .
  • So, I have modified and called the functions as -
/* Making a common Function*/

void
pd(register FILE *plotfp, double x, double y,char c)
{
    size_t ret;
    double in[2];
    unsigned char out[2*8+1];

     if (pl_outputMode == PL_OUTPUT_MODE_BINARY) {
	in[0] = x;
	in[1] = y;
	htond(&out[1], (unsigned char *)in, 2);

	out[0] = c;
	ret = fwrite(out, 1, 2*8+1, plotfp);
	if (ret != 2*8+1) {
	    perror("fwrite");
 	}
    } else {
	fprintf(plotfp, "%c %g %g\n", c, x, y);
    }
}
void
pd_move(register FILE *plotfp, double x, double y)
{
	pd( plotfp, x, y, 'o');                /*using common function*/
}
void
pd_cont(register FILE *plotfp, double x, double y)
{
 	pd( plotfp, x, y, 'q');              /*using common function*/   
}
  • Like this, in src/libbn/plot3.c only, there are 3 pairs i.e. in all 6 functions using same piece of code. Similarly, there is a whole huge bunch of repeated lines of codes in the entire codebase, which I am planning to work on.
  • Reducing the amount of code does, however, make a big difference in modification of the code. The more code there is, the harder it is to modify correctly. Hence, this needs to be done as early as possible as everyday developers are working and adding new and new code to it.
  • The above is what I intend to do as part of GSoC.
  • It is not just about 3 months of GSoC, I want to be committed with this code for years. I will keep contributing even after that as thing is an endless job-new codes are definitely going to be added and that in-turn would also require code reducing and refactoring. Moreover, by the end of 3 months, I believe, I would have hacked the entire source code so well that I can work on other projects also and contribute in making BRD-CAL more awesome!

Links to any code or algorithms you intend to use

TOOLS :

  • The shell script to find duplication and grep command will help me fetch many pointers.
  • Moreover, I am planning to use the Simian tool, but if the need arises or mentor suggests, I will learn new tool and use that.

ALGORITHM I intend to use:

  • Step1:Select one or more defect to review.
  • Step2: Mark as claimed and being worked on.
  • Step3: if ( More than three lines of code ) goto step 4 else Change according to my Judgment
  • Step4: If ( More than in a file ) then goto step 5 else goto step 8
  • Step5: If ( More than a directory ) then goto step 6 else goto step 7
  • Step6: Probably warrants a new function in the highest common library and goto step 11
  • Step7: Create function, add declaration to private header and goto step 11
  • Step8: If ( In a library ) then goto step 9 else goto step 10
  • Step9: Create hidden/static function or macro in the same file and goto step 11
  • Step10: Create static function or macro in the same file and goto step 11
  • Step11: If it warrants a regression test the goto step 12 else goto step 13.
  • Step12: Add regression/ unit test and commit and goto step 14.
  • Step13: If it is user visible goto step 14 else goto step 15.
  • Step14: Update TODO and goto step 15

/* Here check once again for was it good code to change and leaves back no duplication, has a regression or unit test if warranted and it passes and all docs were updated like TODO, deprecation.txt, doc-book, doxygen comments etc */

  • Step15: Commit and mark as 'Needs Review' and goto step 16
  • Step 16: Estimate impact, update severity and goto step 17.
  • Step 17: Record in the ledger and goto step 1.

Problems I would encounter and how will I solve them ?

  • The task of code refactoring and reducing is very critical as it play around with most of the folders and files present in the source folder and will modify and delete and add some new lines as per the requirements. There is huge possibility of changing the functionality of the program and make it behave in a bad manner.
  • So, what I have though is I would compile after every change. The complier will scream out if there is syntactical errors and missing definitions.
  • Moreover, I will run unit test and test with that there is no run time error also and the behavior of code remains the same.
  • Rest, Google, my best friend who knows everything is always with me and will help me in every situation.

Deliverables

  • Consolidating and Eliminating similar codes.
  • Breaking out a extraordinary long function into more manageable bites
  • Make code more readable and maintainable
  • Make it easier to document
  • Create Reusable code
  • Better function cohesion
  • My final goal would be reducing the size of entire source by 8-10% of what it is at present.

Importance of the Project:

  • Improving Design :As people change the code to realize short term goals, the code will lose its structure and program will decay.
  • Makes Software Easier to Understand : Still compiler taking a few more overheads and delivering output a few milliseconds later is not of huge concern to most of us. But, what worries us is that a new developer will have to spend a week figuring out the part he wants to change which would be actually job of an hour if the code was well managed.
  • Easier to add new Features

Time availability:

  • My availability for the project would be possible for the specific GSoC period as I would be having my summer break from 1st May to last week of July. Since in August the new semester would have just begun, I would be able to spare enough time to work on the project as I am not involved in anything else which would be a hindrance.
  • I will also work on Sundays if needed needed to cope-up in-case if I am unable to meet with my development schedule due to unavoidable circumstances.

Even after GSoC period I will get enough time to continue contributing to BRD-CAL.

Development schedule:

/* Pre-GSoC Period */

  • April 19- 22 : Downloaded Source-code and made one initial patch.
  • April 23-30 : Final End-Semester Exams at University.
  • May 1-3 : Submit some more patches to showcase my work.
  • May 4-21: Ensure Build Environment is working fine, do some research on how to maintain large code-base, stay in constant contact with the mentor to discuss efficient ways and asking them questions, gain in-depth knowledge about Zero One or Infinity (ZOI) , Don't Repeat Yourself (DRY), Rule of Three and Cargo Cult-Programming and make some more patches.
  • Also, here I would make folder which contains information about duplication in every folder.

/* GSoC period Starts */

  • May 21- May 25 : (PHASE I: Analysis and Design Period)
    • Getting more familiar with the code
    • At the end of this period : A finalized plan to proceed for the development of the library.
  • May 26 – August 15 (PHASE II: Development Phase)
    • June 15 – July 7 - Refactoring and Coding. Start fixing duplications here.
    • July 7 – July 14 - bugs fixing, review to be ready for submitting.
    • July 15 -July 21 -Organizing source code documentation that will work well with doxygen.
    • July 21 –24 - submitting and some more review and updating if it is necessary and making mid-term report.
    • July 25 – August 15 - modifying other files in the source code that gets affected due to changes made earlier.
    • At the end of this period : A complete library with all functionalities and all of them having been tested at unit level.
  • August 16- September 27 (PHASE III :Testing, Cleaning and Wrapping Up)
    • August 16 – August 20 - System Level testing at multiple platforms.
    • August 21-September 9 - bugs fixing, review to be ready for submitting the final result.
    • September 10- 22 -Pencils down ,Code clean up , Documentation (wiki pages)
    • Sept. 23 – 27- Final evaluation and Submit code to Google
    • At the end of this period: Deliverable mentioned would be successfully be completed.

My preparation for the Project:

  • Joined IRC channel #brlcad
  • I have mailed in the mailing list of BRL-CAD talking about my interests and planning for near future.
  • Surfed through previous year projects where I found one similar project. I am also in contacting with that candidate who would guide me and share her experience with me.
  • Cloned the source-code into my system.
  • Opened some src files, saw some codes and found huge cod repition over there. *As my first step, I had modified the file /src/libbn/plot3 and reduced some amount of code.
  • I didn't have previous experience with SVN version control system. Made an account on sourceforge.net and submitted my first patch.
  • <Right now>I am trying to gain some knowledge about the tools used for checking code similarity and how to make unit test.
  • Becoming familiar with different rules of code reducing like Zero One or Infinity(ZOI), Don't Repeat Yourself (DRY), Rule of Three and Cargo Cult-Programming
  • Due to the burden of final exams I am afraid that I couldn't work to my maximum potential till 30th April. But from 1st May to May 3rd, I am planning to submit some more patches to show my work.

Why me?

  • The reason why I am a suitable candidate for "Code Refactoring" project is because most importantly, I love programming in C and C++ and figuring others code looks quite fascinating job to me.
  • Moreover, I had won 3rd and 1st position respectively in "i-code and C-debugging contest" organized by IEEE as a part of I-Fest at DA-IICT in 2012 and 2011 respectively.
  • I have been certified as excellent C and C++ coder with A++ grade by Centre for Development and Advanced Computing(CDAC), Pune, India.

Why BRL-CAD?

  • The code is entirely written in C, the programming language in which I am the stongest.
  • Also, I chose BRL-CAD because I think it's great to work on such a powerful Open Source modeling system.

I promise that if given this opportunity, I will work with full dedication, give my 100% efforts, leave no stones unturned and reduce the code and make it more readable.