A package to run on Hadoop to read XML data expose hive views
- or -
Post a project like this3149
$$
- Posted:
- Proposals: 2
- Remote
- #1152049
- Awarded
Description
Experience Level: Intermediate
Estimated project duration: less than 1 week
General information for the business: Read XML and spit Hive views
Description of requirements/functionality: A package or script to run and bigdata hadoop environment, which reads XML, CSV and expose Hive views.
Input XMLs have nested tables with multiple parent child relationships.
Requirement is to write a generic code to parse and convert any XML document into multiple files based on the number of nested tables and maintain keys for referential integrity.
Converted XML should be stored in parquet file format on HDFS USING Spark/Spark streaming. Map Reduce should be avoided.
Further digging suggest using Spark on Hive could be much faster.
Should be delivered within a week
Extra notes:
Needs to be a complete package that can be installed very easily without additional support or configuration. Need a step by step installation note.
To begin with we can go with the assumption XML File is not corrupt. If it is corrupt reject the file and push it into reject folder on HDFS.
What do you need from me?
Description of requirements/functionality: A package or script to run and bigdata hadoop environment, which reads XML, CSV and expose Hive views.
Input XMLs have nested tables with multiple parent child relationships.
Requirement is to write a generic code to parse and convert any XML document into multiple files based on the number of nested tables and maintain keys for referential integrity.
Converted XML should be stored in parquet file format on HDFS USING Spark/Spark streaming. Map Reduce should be avoided.
Further digging suggest using Spark on Hive could be much faster.
Should be delivered within a week
Extra notes:
Needs to be a complete package that can be installed very easily without additional support or configuration. Need a step by step installation note.
To begin with we can go with the assumption XML File is not corrupt. If it is corrupt reject the file and push it into reject folder on HDFS.
What do you need from me?
Hamsa B.
100% (2)Projects Completed
4
Freelancers worked with
4
Projects awarded
15%
Last project
13 Dec 2017
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies