Transcribe Geometry Model Data from a PDF report to an ASCII fileBRL-CAD
Status: ClosedTime to complete: 100 hrs Mentors: Isaac Kamga, DeepakTags: docs, geometry, transcription, documentation, convert, text, 3D, ocr, pdfBeginner

We have scans (PDF) of a number of reports documenting early geometric models in the COMGEOM format (a now obsolete format, but the models are interesting nonetheless). These reports contain the actual geometry defining the model as pages and pages of numbers and letters. Unfortunately, the quality is sufficiently poor that optical character recognition (OCR) has a very high rate of error.

This task is to attempt the manual transcription of the MEP-021A Generator Set model described in the report ''A Combinatorial Geometry Computer Description of the MEP-021A Generator Set'' (see the References list below for the link that will let you download the PDF). One possible approach is to use Acrobat Reader or some other PDF reader select and copy the OCR text, paste that to a text file as a starting point, and then manually correct it. There may also be some patterns that will allow for semi-automated processing (for example, if 5 zeros in a row are commonly replaced with the character ''O'' instead of 0, a search and replace is in order.) However you wish to approach it is fine, but remember that the goal is not just the extraction of the OCR text but the production of an accurate transcription of the file. The OCR text can be used as a starting point but it will NOT be accurate.

The goal is to have a file that can be fed to BRL-CAD's comgeom-g importer to generate an accurate .g file. If the generator conversion goes well, there are a significant number of other models (some of them considerably more complex) that we are interested in that can be used to create more tasks of this nature.

References:

  • http://www.dtic.mil/docs/citations/ADA073408

Code:

  • src/conv/comgeom

Please discuss your progress with the developers. This task has the potential for being broken up into multiple tasks depending on the time and accuracy of your conversion.

Uploaded Work
File name/URLFile sizeDate submitted
TABLE A.asc32.1 KBDecember 03 2014 17:37 UTC
TABLE A Final3.asc32.9 KBDecember 05 2014 17:07 UTC
TABLE A Final4 Dec 8 2014.asc55.5 KBDecember 08 2014 22:01 UTC
Comments
Andrewon December 1 2014 20:49 UTCTask Claimed

I would like to work on this task.

Mandeep Kaur on December 1 2014 21:13 UTCTask Assigned

This task has been assigned to schembora. You have 100 hours to complete this task, good luck!

Andrewon December 1 2014 21:15 UTCClaim Removed

The claim on this task has been removed, someone else can claim it now.

Rexeyon December 2 2014 01:29 UTCTask Claimed

I would like to work on this task.

Mihai Neacsu on December 2 2014 01:30 UTCTask Assigned

This task has been assigned to Rexey. You have 100 hours to complete this task, good luck!

Rexeyon December 3 2014 17:38 UTCReady for review

The work on this task is ready to be reviewed.

Daniel_R on December 4 2014 16:30 UTCPlease maintain the columns

by inserting spaces.  Otherwise the comgeom-g importer has no chance to generate a .g file.

Daniel_R on December 4 2014 16:30 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Melange on December 5 2014 05:30 UTCTask due soon

There are less than 24 hours left until the deadline, please submit your work soon.

Rexeyon December 5 2014 17:07 UTCReady for review

The work on this task is ready to be reviewed.

Melange on December 6 2014 05:30 UTCNo more Work can be submitted

Melange has detected that the deadline has passed and no more work can be submitted. The submitted work should be reviewed.

Sean on December 7 2014 05:12 UTCDeadline extended

The deadline of the task has been extended with 2 days and 2 hours.

Sean on December 7 2014 05:12 UTCTask Needs More Work

One of the mentors has sent this task back for more work. Talk to the mentor(s) assigned to this task to satisfy the requirements needed to complete this task, submit your work again and mark the task as complete once you re-submit your work.

Sean on December 7 2014 05:40 UTCcolumns

Rexey, this is looking a lot better but there is still some column misalignment.  Notice 3rd and 4th line of the file that looks like this:


1       RCC   -303.0232       0.0000  0.0000  697.4867        0.0000  0.0000  CRANKSHAFT


                12.7001 0.0000  0.0000  0.0000  0.0000  0.0000  CRANKSHAFT


 


With columns aligned, it should look like:


 


1       RCC     -303.0232       0.0000  0.0000  697.4867        0.0000  0.0000  CRANKSHAFT


                  12.7001       0.0000  0.0000    0.0000        0.0000  0.0000  CRANKSHAFT


 


Each item should be aligned with the column above it.  Notice how the REMARK item (CRANKSHAFT) ends up aligned and all of the decimal points align.


 


Make sense?  Once more pass and this will be complete.  Note that there are a lot more tasks like this one too. :)


 


 

Melange on December 8 2014 07:12 UTCTask due soon

There are less than 24 hours left until the deadline, please submit your work soon.

Rexeyon December 8 2014 22:01 UTCReady for review

The work on this task is ready to be reviewed.

Sean on December 9 2014 06:47 UTCTask Closed

Congratulations, this task has been completed successfully.

Sean on December 9 2014 06:49 UTCbeautiful work!

Rexey, that looks MUCH much better, thank you!  How long did this take you in terms of hours?


There are other tasks we have just like this one, but we want to make sure they're appropriately scoped.  We can create many more too, if you're interested in getting the t-shirt or going for the top-10!  Thank you for your efforts, well done.

Rexeyon December 9 2014 16:32 UTCEffort

It took a few days, some which was spent on trying to get formatting. I would be interested in some more of these types. Thanks.