Difference between revisions of "SJ1"

m (Added more content)
m (Stattrav moved page Stattrav proposal 2012 to SJ1)
 
(3 intermediate revisions by one user not shown)
Line 1: Line 1:
== Personal details ==
 
* Name: Suryajith Chillara
 
* irc handle: Stattrav
 
* email id: suryajith1987@gmail.com
 
* background: <At the end of the page.>
 
  
= Sources =
 
 
== ftp ==
 
Users could just transfer/submit their logs to the public ftp server.
 
 
== scp ==
 
Developers could transfer the logs to the a specific folder on the server.
 
 
== Mail ==
 
Users mail the benchmark file to "benchmark@brlcad.org"
 
 
== http API ==
 
There is a http API which recieves the file as a post message which could be sent using curl or equivalent tool.
 
 
== Diagram ==
 
(Preliminary)
 
[[Image:Stattrav_process_diagram.jpg|500px]]
 
 
= Mechanism =
 
 
== Web API ==
 
A http API has to be made which accepts the benchmark file as a POST message or use a file upload mechanism. The http call is embedded into the benchmark shell script or a seperate script through which the user can just use "benchmark-post" or some such command. The file upload could be automated using urllib2(Python) or somesuch equivalent lib for other languages. If this mechanism is included in the benchmark shell script, one could just use an extra argument which could be something such as "--push-result-to-web=true".
 
 
== FTP sync ==
 
Similar to the http push, one can also implement the FTP sync at the userend. The files are submitted to a queue folder and here the mechanism is a polling script which checks for any new file and introduces them into the db and the file storage folder. Here the polling script(say at a frequency of 5mins) can find the files which have been created after the last poll and then check if they are already introduced to the db by checking their md5sum(which could be stored in a separate table). This script dumps the log to a file which could be used to check if the script has worked properly. There could be another script which could be a cleaning script which checks the logs, pushes them to the db and file storage folders incase some of the log files have not been moved and emails if there are any discrepancies. The log files in the file storage folder could be stored as .gz
 
 
== scp sync ==
 
Similar to that of an FTP client from the queue folder.
 
 
== Mail server ==
 
Similar to the FTP/scp sync, a polling script could be written to check the IMAP server and bring in the attachments.
 
 
= Data extraction =
 
TODO
 
 
 
= Storage =
 
== Values to be maintained in the db ==
 
TODO
 
== flat file on the disk ==
 
TODO
 
== Db Schema ==
 
TODO
 
== Backup ==
 
TODO
 
 
 
= User-end code changes =
 
== User-end tools ==
 
TODO
 
== Other data needed for analysis ==
 
TODO
 
== Cross platform solutions for the data  ==
 
TODO
 
 
= Core scripts to be written =
 
== FTP/scp sync ==
 
TODO
 
== FTP/scp verifier ==
 
TODO
 
== IMAP sync ==
 
TODO
 
== IMAP verifier ==
 
TODO
 
== files to db ==
 
TODO
 
== http-push ==
 
TODO
 
== ftp-push ==
 
TODO
 
 
== Frontend ==
 
TODO
 
 
== Analysis ==
 
TODO
 
 
== Background ==
 
I am a first year grad student at Chennai Mathematical Institute studying computer science.
 
I've been a Google Summer of Code Scholar with Sahana Software Foundation(code: https://bitbucket.org/suryajith/gsoc-2010 and reference: lifeeth@gmail.com who was my mentor) where I've developed a handwritten character recognition module which was implemented as a wrapper over the tesseract, implemented automated training module for the character recognition module and an entirely independent module of generating the OCR/HCR-able forms.
 
 
Post the summers of 2010, I've joined as a software developer at a web startup(www.capillary.co.in) and there have worked on the company's custom MVC which was refactored and mostly rewritten during my stay at the company. I've played an important role in developing many quintessential web APIs, the controller and model code of various modules and scaling/performance tuning of the LAMP stack. I am particularly pleased about the database connection layer which I have implemented there. Apart from this, I've been working on a few independent modules which involved a lot of machine learning and text parsing which were implemented in Python. I am very familiar with the web paradigms, web framework design and backend scaling. Though I have predominantly worked with Python and php with MySQL so far, I can pick up and thus work with anything other language or tool required pretty fast.
 
 
I've been lurking and following brlcad for around 3-4 years now and I shall continue to be around and working for the same.
 

Latest revision as of 02:09, 26 December 2014