I would like a program built which can draw out key data points from a batch of PDF t
- or -
Post a project like this2325
£15/hr(approx. $19/hr)
- Posted:
- Proposals: 9
- Remote
- #1801286
- Awarded
Python | Web Scraping | Software development | API Development | Data Mining | Automation
Rajkot
Web Designing, Software Development, Social Promotion, Data Entry, SEO, Data Entry, Virtual Manager, Admin Support
Kolkata
19102851935338190014017479841937136172892018886987343311950097
Description
Experience Level: Intermediate
General information for the business: Research
Description of requirements/functionality: I have a whole collection of movie scripts, all in the same format and I would like some to build a program which analyses each document and pulls out key information.
Movie scripts follow a very rigid formatting standard, and most are written in a handful of writing programs, which means we can be sure that all my files will follow the same consistent formatting. To give you an idea, check out this image https://www.nyfa.edu/student-resources/wp-content/uploads/2014/06/final-draft-screen-shot.png
Most of the work in indemnifying elements will be achieved via the indentation, capitalization and certain key phrases (such as EXT or INT for the start of a scene).
I can help give details about what each of these are.
From each PDF document the program looks at, I would like two types of data:
1. Basic data – such as number of pages, the data on the cover page (e.g. https://www.writersstore.com/system/imagemanager/screenplay-title-page-example.gif) etc.
2. More complex data – Such as the number of times a character speaks, the description used to define a character, etc. These are all identifiable by strict formatting rules and I can explain these. The person writing the tool will need to work out how to code the program to look for these particular patterns.
Right now I intend this program just to be used by me privately in my research. If it proves useful then I may take it further but that doesn’t seem very likely now.
I assume that the best way of dealing with this is to build a very basic version (grabbing the data listed above) and then for us to add functionality as needed.
To apply, please let me know the following things:
• How you would tackle this (i.e. the coding language you’d use, etc)
• How you would bill your time (i.e. flat fee, by the hour etc) and at what rates.
• An estimate of the time / cost involved. I appreciate that this may change when we figure out the detail but it would be useful to get a sense of scale from your point of view.
• Examples of your past work
Because I won’t know how to pick between different applicants, I will use the information above to generate a shortlist of a few people. Then we can chat more about the requirements, meaning you can generate a more detailed plan and quote.
You can ignore the amount this job is listed with. It's a dummy amount just so I can post the job. Write your quote in the application text and we will talk before I accept anything.
Any applications which only say “We can do this!” won’t be considered. I do need a bit of context on you and how you’d tackle it in order to be able to pick between applicants.
Thank you.
Extra notes:
Description of requirements/functionality: I have a whole collection of movie scripts, all in the same format and I would like some to build a program which analyses each document and pulls out key information.
Movie scripts follow a very rigid formatting standard, and most are written in a handful of writing programs, which means we can be sure that all my files will follow the same consistent formatting. To give you an idea, check out this image https://www.nyfa.edu/student-resources/wp-content/uploads/2014/06/final-draft-screen-shot.png
Most of the work in indemnifying elements will be achieved via the indentation, capitalization and certain key phrases (such as EXT or INT for the start of a scene).
I can help give details about what each of these are.
From each PDF document the program looks at, I would like two types of data:
1. Basic data – such as number of pages, the data on the cover page (e.g. https://www.writersstore.com/system/imagemanager/screenplay-title-page-example.gif) etc.
2. More complex data – Such as the number of times a character speaks, the description used to define a character, etc. These are all identifiable by strict formatting rules and I can explain these. The person writing the tool will need to work out how to code the program to look for these particular patterns.
Right now I intend this program just to be used by me privately in my research. If it proves useful then I may take it further but that doesn’t seem very likely now.
I assume that the best way of dealing with this is to build a very basic version (grabbing the data listed above) and then for us to add functionality as needed.
To apply, please let me know the following things:
• How you would tackle this (i.e. the coding language you’d use, etc)
• How you would bill your time (i.e. flat fee, by the hour etc) and at what rates.
• An estimate of the time / cost involved. I appreciate that this may change when we figure out the detail but it would be useful to get a sense of scale from your point of view.
• Examples of your past work
Because I won’t know how to pick between different applicants, I will use the information above to generate a shortlist of a few people. Then we can chat more about the requirements, meaning you can generate a more detailed plan and quote.
You can ignore the amount this job is listed with. It's a dummy amount just so I can post the job. Write your quote in the application text and we will talk before I accept anything.
Any applications which only say “We can do this!” won’t be considered. I do need a bit of context on you and how you’d tackle it in order to be able to pick between applicants.
Thank you.
Extra notes:
PPH User P.
100% (208)Projects Completed
104
Freelancers worked with
135
Projects awarded
66%
Last project
18 Mar 2019
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
Hello there.
I would to clarify one thing first:
Do you need the written program to be using Image recognition algorithms to solve indentations, words etc., or the entire logic behind your request could be processed just by parsing the data in your .pdf's as Text ?
546956
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies