Extracting structured and unstructured data from complex finance documents using a proprietary extraction framework utilizing proprietary NLP techniques.
Requirements
- Strong Python experience
- Experience with regular expression and various NLP libraries
- Experience with the one or more of the following python libraries; PDFLib, ply.lex and/or ply.yacc
- Experience with parsing PDF/DocX document types
- Data Warehouse understanding
- Experience with DevOps
- Experience with Databases
- Proficiency with code versioning tools, such as Git, Github, Bitbucket
- Preferrable but not required, experience with GraphQL
Responsibilities
- Development of proprietary NLP algorithms utilizing, in part, spacy, nltk and others
- Development of reusable code and libraries
- Development of additional tools and interfaces to create further efficiencies and precision in our data extraction methodology
- Traditional backend work involving anything from REST APIs to database queries
Other
- Minimum of a Bachelor’s degree in a related field.
- Experience with common project management tools and Agile development workflow
- Ingenuity, creativity, drive and determination
- Clear communication skills
- Strong organizational skills, including the ability to respond quickly in a fast-paced environment