Transcribe Geometry Model Data from a PDF report to an ASCII file Helicopter #2BRL-CAD
Status: ClosedTime to complete: 100 hrs Mentors: Gauravjeet Singh, IshwerdasTags: docs, geometry, transcription, documentation, convert, text, 3D, ocr, pdfBeginner

We have scans (PDF) of a number of reports documenting early geometric models in the COMGEOM format (a now obsolete format, but the models are interesting nonetheless). These reports contain the actual geometry defining the model as pages and pages of numbers and letters. Unfortunately, the quality is sufficiently poor that optical character recognition (OCR) has a very high rate of error.

This task is to attempt the manual transcription of a portion of the Black Hawk Helicopter model described in the report ''Computer Description of Black Hawk Helicopter'' (see the References list below for the link that will let you download the PDF). One possible approach is to use Acrobat Reader or some other PDF reader select and copy the OCR text, paste that to a text file as a starting point, and then manually correct it. There may also be some patterns that will allow for semi-automated processing (for example, if 5 zeros in a row are commonly replaced with the character ''O'' instead of 0, a search and replace is in order.) However you wish to approach it is fine, but remember that the goal is not just the extraction of the OCR text but the production of an accurate transcription of the file. The OCR text can be used as a starting point but it will NOT be accurate.

The preferred format to provide the pages in is a comma-separated value ASCII text file, which is suitable for post-processing.

The eventual goal is to have a file that can be fed to BRL-CAD's comgeom-g importer to generate an accurate .g file. The description of this target is a couple hundred text pages (which will take much longer than a single GCI task if you're doing correctness checking!) so there will be multiple tasks for pieces of the file. For this task, pleas submit a csv file with the content of the tables on pages

102-133

References:

Please discuss your progress with the developers.

Additional information on comgeom

Uploaded Work
File name/URLFile sizeDate submitted
stuff.txt111.5 KBJanuary 18 2015 15:58 UTC
Comments
Arnavon December 6 2014 12:28 UTCTask Claimed

I would like to work on this task.

Ch3ck on December 6 2014 12:48 UTCTask Assigned

This task has been assigned to Arnav. You have 100 hours to complete this task, good luck!

Arnavon December 7 2014 10:08 UTCa problem has occured

sir,


please give a example to what to do in this task.


 


i am some what not able to understand the main work of this task.

Arnavon December 7 2014 10:08 UTCa problem has occured

Sir,


Please give a example to what to do in this task.


 


I am some what not able to understand the main work of this task.


 


Regards


Arnav

Arnavon December 7 2014 10:11 UTCClaim Removed

The claim on this task has been removed, someone else can claim it now.

elloon December 8 2014 22:38 UTCTask Claimed

I would like to work on this task.

elloon December 8 2014 22:38 UTCClaim Removed

The claim on this task has been removed, someone else can claim it now.

xirowon December 24 2014 06:28 UTCTask Claimed

I would like to work on this task.

Harmanpreet on December 24 2014 07:16 UTCTask Assigned

This task has been assigned to xirow. You have 100 hours to complete this task, good luck!

Melange on December 28 2014 11:16 UTCTask Reopened

Melange has detected that the final deadline has passed and it has reopened the task.

Phoebeon January 1 2015 14:58 UTCTask Claimed

I would like to work on this task.

Mihai Neacsu on January 1 2015 15:24 UTCTask Assigned

This task has been assigned to Phoebe. You have 100 hours to complete this task, good luck!

Melange on January 5 2015 19:24 UTCTask Reopened

Melange has detected that the final deadline has passed and it has reopened the task.

Erica Wengon January 6 2015 06:25 UTCTask Claimed

I would like to work on this task.

Erica Wengon January 6 2015 06:37 UTCClaim Removed

The claim on this task has been removed, someone else can claim it now.

Jacob Lon January 7 2015 10:47 UTCTask Claimed

I would like to work on this task.

Popescu Andrei on January 7 2015 10:47 UTCTask Assigned

This task has been assigned to Jacob L. You have 100 hours to complete this task, good luck!

Jacob Lon January 11 2015 07:03 UTCClaim Removed

The claim on this task has been removed, someone else can claim it now.

Mou Yan Qiaoon January 16 2015 07:18 UTCTask Claimed

I would like to work on this task.

Popescu Andrei on January 16 2015 07:19 UTCTask Assigned

This task has been assigned to Mou Yan Qiao. You have 81 hours to complete this task, good luck!

Mou Yan Qiaoon January 16 2015 07:23 UTCClaim Removed

The claim on this task has been removed, someone else can claim it now.

Vladimir Kuznetsovon January 17 2015 12:58 UTCTask Claimed

I would like to work on this task.

Sean on January 17 2015 13:05 UTCTask Assigned

This task has been assigned to Vladimir Kuznetsov. You have 51 hours to complete this task, good luck!

Vladimir Kuznetsovon January 17 2015 17:11 UTCOMG

 So that's a beginner task, huh?
I'll try to do my best, but there are some serious troubles:
1)Reader marks some of the pages not by lines, but by columns. This gives me a lot of lines in the file and a takes a lot of time to sort.
2)Some digits are so bad, even I can't recognize them. If computer does, it's a miracle.
3)Inaccuracy. Computer may recognize some digits not as they are (6 instead of 8, for example). I guess, when you import this to MGED you'll get a heli-shaped figure, but not the helicopter itself. 


This'll take a lot more than 50 hours, but, as i said above, i'll try to do my best.

Vladimir Kuznetsovon January 18 2015 15:58 UTCReady for review

The work on this task is ready to be reviewed.

Sean on January 18 2015 16:24 UTCindeed a tedious

Hi Vladimir,


It is indeed considered a beginner task because it's not difficult.  It's VERY tedious, but not at all "hard" in the sense that you don't really need to know much to complete the task.  You just need to type what you see.


If we do these tasks again, we'll definitely make them into smaller page sets (maybe 10-20 pages per task).  We have hundreds of old models like this one that are only available as terrible scans like this one.  It is pretty much impossible for a computer to scan these due to the errors, which is exactly why we need a human to do it (and even then, sometimes it'll be impossible without guessing).  It's the best we can do. ;)


 

Sean on January 18 2015 16:24 UTCTask Closed

Congratulations, this task has been completed successfully.