Hadoop Projects
Looking for freelance Hadoop jobs and project work? PeoplePerHour has you covered.
Post an offer to educate them
Past "Hadoop" Projects
Data/code integrity, recovery, deployments; investigations
Data Detective Wanted! Join the Frontline of Data Integrity Are you a data whiz with a Sherlock Holmes streak? Do you thrive on untangling complex code and restoring order to chaotic data? Do you get a thrill from uncovering hidden insights and preventing potential disasters? Then join our crack team of Data Scientist Engineers! In this critical role, you'll be: The guardian of data integrity: Ensure the accuracy, completeness, and security of our data pipelines and codebases. The data recovery hero: Diagnose and troubleshoot data corruption, losses, and inconsistencies. The deployment maestro: Oversee smooth and efficient deployments of new data models and code updates. The investigative sleuth: Uncover anomalies, identify root causes of data issues, and recommend solutions. The code whisperer: Understand and maintain our complex data infrastructure, from algorithms to pipelines. You'll be a perfect fit if you have: A strong understanding of data science principles and best practices. Expertise in programming languages like Python, Java, or Scala. Experience with data engineering tools and technologies (e.g., Spark, Hadoop, Airflow). Excellent analytical and problem-solving skills with a passion for data exploration. A keen eye for detail and a meticulous approach to data quality. Strong communication and collaboration skills to work effectively with cross-functional teams. This is more than just a job; it's an adventure! Join us and become a data hero, ensuring the highest standards of data integrity and driving critical insights for our business. Ready to crack the code and solve the data mysteries? Apply today! Bonus points for: Experience with data security and privacy practices. Familiarity with cloud computing platforms like AWS or Azure. Passion for data storytelling and visualization. Don't let this opportunity slip away! Apply now and unleash your inner data detective!
Docker Datalake running on Rocky Linux v9.2 Virtual Machines
When you setup the below solution, please keep in mind how I am going to distribute this in my LAB (number of VMs) and that it's all part of the same "datalake" solution: Hadoop v3.3.4 on Docker Cluster of 3 x VMs: 1 x Master + 2 x Slaves (Master hostname=VM1, Slave1 hostname=VM2, Slave2 hostname=VM3) HDFS data should be at OS level=/mnt/voldatalake . This mountpoint should be mounted as a volume inside Hadoop docker to save data to. Basically, it exists in the OS level, but writable at the docker level as well. Hadoop Replication should be ON on all datanodes (any change to one's data, replicates to the others). Datanodes/Namenodes/Yarn/Resource Manager/Hive should be installed/setup correctly. Hive v3.1.2 on Docker Apache Hive should be running on the same Hadoop cluster/VMs Apache Spark v3.4.0 on Docker Cluster of 3 x VMs: 1 x Master + 2 x Slaves (Master hostname=VM4, Slave1 hostname=VM5, Slave2 hostname=VM6) Presto latest version on Docker Cluster of 2 x VMs: 1 x Master + 1 x Slave (Master hostname=VM7, Slave hostname=VM8) Python v3.9.1 on Docker Python v3.9.1 should be installed and working on all docker VMs/solutions above Notes: 1 - All firewall ports should be allowed and mapped 1:1 (i.e.: port=9870 in docker should be port=9870 outside docker and so on), so it all be be accessed from outside the VMs/docker 2 - All should be done with Git and files easily modifiable, easy installable (if ports needed to be added/changed, services need to be added/changed and extra nodes need to be added/changed, versions needed to be added/changed). Example, in case I want to use Hadoop v3.5.6 instead of v3.5.4, or I want to add an extra worker node to one of the solutions above). 3 - Docker images can be easily modified/changed to reflect new updates (whenever needed). 4 - Cheat sheet documentation should be provided to know how to install/deploy where the above files/config files/variable lines need to be updated considering the above changes mentioned); Please, reach out if you need more info or clarification.
Data Engineer
As a Data Engineer, you will be responsible for designing, developing, and maintaining the systems and infrastructure needed to process and manage large volumes of data. Your role will involve working with various data sources, transforming data into usable formats, and ensuring data quality and reliability. Additionally, you will collaborate with cross-functional teams to develop scalable data solutions and optimize data workflows. Key Responsibilities: Data Pipeline Development: Design and implement data pipelines to extract, transform, and load (ETL) data from various sources into databases or data warehouses. Develop efficient data integration processes that handle large volumes of structured and unstructured data. Database Design and Optimization: Work closely with data architects and analysts to design and optimize database structures and schemas for efficient data storage and retrieval. Ensure proper indexing, partitioning, and data organization to maximize query performance. Data Transformation and Cleansing: Develop scripts and workflows to clean, transform, and preprocess raw data into usable formats. Apply data quality checks and validation techniques to ensure accuracy and consistency of data. Data Modeling and Warehousing: Design and implement data models for data warehousing and reporting purposes. Develop and maintain data marts and dimensional models to support analytical queries and reporting needs. Performance Tuning and Optimization: Monitor and analyze database performance, identifying and resolving performance bottlenecks. Optimize SQL queries, indexes, and database configurations to enhance system performance and scalability. Data Security and Governance: Implement data security measures, access controls, and data encryption techniques to protect sensitive information. Ensure compliance with data governance policies and regulations. Collaboration and Integration: Collaborate with data scientists, analysts, and other stakeholders to understand data requirements and develop solutions that meet business needs. Integrate data from various systems and platforms to create a unified view of data. Documentation and Documentation: Maintain comprehensive documentation of database designs, data models, ETL processes, and workflows. Document data standards, data dictionaries, and data lineage for effective data management and governance. Qualifications: Bachelor's degree in Computer Science, Information Systems, or a related field (or equivalent work experience). Proven experience as a Data Engineer or in a similar role. Strong proficiency in SQL and experience with relational databases (e.g., Oracle, MySQL, Microsoft SQL Server). Familiarity with big data technologies (e.g., Hadoop, Spark, Hive) and NoSQL databases (e.g., MongoDB, Cassandra). Experience with ETL tools and data integration frameworks. Knowledge of data modeling concepts and dimensional modeling. Understanding of data warehousing and business intelligence concepts. Proficiency in programming languages like Python, Java, or Scala. Familiarity with cloud-based data platforms (e.g., AWS, Azure, GCP). Strong analytical and problem-solving skills. Excellent communication and collaboration skills.
SPARK and Kafka project help needed!
Hi, I am looking for help from someone from data engineering experience. If you have experience with Spark, Databricks, Kafka, Airflow, SQL, and Azure cloud, including big data processing tools such as Hadoop, HBase
Tutor needed to help install Hadoop, MySQL, Hive, Sqoop & Spark
Computer science student requires tutor with flexible schedule who can connect online and guide through the installation and configuration of Hadoop, Hive, MySQL, Sqoop and Spark on virtual machine. --
Machine Learning Engineer
we are looking for talented people who will put our customers at the center of everything we do. We are seeking candidates who embrace diversity, equity and inclusion in a workplace where everyone feels valued and inspired. Help us build a better Wells Fargo. It all begins with outstanding talent. It all begins with you. We are currently seeking a Senior Software Engineer...The Artificial Intelligence (AI) technology team is looking for a highly motivated and experienced agile application developer. The right candidate will have expert level experience in Python, Spark and Hadoop Big Data environment and preferably Public Cloud experience with GCP, AWS or Azure.
Help teach Hadoop 3.3.0 Multinode installation -
I have been trying to set up a Hadoop 3.3.0 multinode cluster on ec2 machines but it has not been successful even though the single node installation is successful. I need someone who has successfully installed before and can walk me through I won't expect this to be longer than one or two hours
Need a coding expert
I have 15 to 20 questions in each of below languages. in C: Functions, Loops, Data Structures, Strings, I/O and File I/O in C++: Standard Library, Syntax and Semantics, Compiler and Preprocessor, Object Oriented Programming, Pointers and Data Structures in Java: Java Data Structures, Java Runtime, Java Fundamentals, Java Access Modifiers, Java System Fundamentals in Python: Fundamentals, Sorting, Data Structures, Advanced Concepts, Object Oriented Programming in Hadoop: MapReduce, Hadoop Common, Hadoop Concepts, Hadoop Components, Hadoop Optimization in oop: Design Patterns, Domain Modeling, Exception Handling, Software Development, Four Principles of OOP Here are some sample: in JS: events, NaN, Ajax, Math, IIFEs in JSON: Storage, Json-P, Syntax, Objects, Queries in React.js: Tools, Basics, Elements, Hooks, Components A sample in JS: Question: What is the HTTP verb to request the contents of an existing resource? Answer 1: GET Answer 2: POST Answer 3: PATCH Answer 4: DELETE
Learning Hadoop with Java
Hi, I am learning Hadoop and want to execute sample java programs on below topics. Time line - 2 days 1. Secondary sort 2. Custom partitioner 3. Tool runner 4. Mapside join 5. Reduceside join 6. Compression 7. Counters 8. Health check - for HDFS,HIVE,HADOOP,YARN . 9. Stocks exercise for sample data using Map Reduce and Java .
Need tutorial content for Engineering student to practice on Python and Java
Require tutorial format for: 1. Using Enterprise IDE - Intellij Community edition (include adding Python and Java plugins to run below programs) 2. Understanding YARN Daemons, configurations and fault tolerance. Execute below programs in Intellij: 3. Map-Reduce Word Count using Python 4. Map-Reduce Word Count using Java 5. Page Rank algorithm using Map Reduce in Python. 6. Page Rank algorithm using Map Reduce in Java. Need clear steps with screenshots about installations, steps taken, executions, logs/outputs and prerequisites to run the above programs. I do not have time to test this as I'm tied up with other projects. So I am pasting the source code links for your reference Python code for Word count available in : https://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ Java Code for Word Count available in :https://bigdataproblog.wordpress.com/2016/05/20/developing-hadoop-mapreduce-application-within-intellij-idea-on-windows-10/ Page Rank code Python : https://en.wikipedia.org/wiki/PageRank Page Rank code Java: https://github.com/luqmanarifin/page-ranker-hadoop/tree/master/src/com/page_ranker ***Need this project in 2 days. Based on this I can provide more projects in future.
Technical writing : Articles on Hadoop SAP Data science
Need quality proposals. I want some one to write the articles on Hadoop, Big data and Data science. Need to write articles with relevant to SAP software. I need this article to be any where between 5 to 7 pages long with excellent diagrams. There are tons of information available online . But need to consolidate every thing and need to write in your own words .
Technical writing
Posting second time : Need quality proposals. I want some one to write the articles on Hadoop, Big data and Data science. Need to write articles with relevant to SAP software. I need this article to be any where between 4 to 6 pages long with excellent diagrams. There are tons of information available online . But need to consolidate every thing and need to write in your own words .
opportunity
Implemention of a data mining algorithm in pyspark
PySpark, Data Mining,hadoop are the skills that I am looking for
AWS Hadoop Cloud support and operations
We have built out a bigdata cloud infrastructure and are looking for someone to provide ongoing operations support if the system goes down and general maintenance to avoid that downtime. We are running on Amazon EC2 instances. We are looking for people that can manage everything from load balancers to hadoop and have deployed the following services that are running and managing the system. Hadoop Zabbix OpenVPN Ambari Server Ambari Agent Zookeeper Kafka HDFS HBase YARN MapReduce2 Tez Hive Pig Oozie Spark Slider Ambari Infra Ambari Metrics
Tutor on using Horton sandbox and Azure Hadoop set up
General information for the business: Looking to set up azure Hadoop and Sandbox, having problems configuring, need assistance. Description of requirements/functionality: Need someone to talk me through the process to set up Hadoop on Azure OS requirements: Windows Extra notes:
Tutoring on setting up Azur Hadoop and Horton Sandbox
General information for the business: Managing sport data Description of requirements/functionality: Need to understand how to set up Hadoop on Azur and also Sandbox. OS requirements: Windows Extra notes:
urgent
I need a shell script on hive to create altertable scripts that convert varchar (1),
General information for the business: HADOOP HIVE Description of requirements/functionality: I need a shell script on hive to create altertable scripts that convert varchar (1), varchar (2), varchar (3) into varchar 15. Specific technologies required: HIVE HADOOP UNIX LINUX OS requirements: Linux Extra notes: I need a shell script on hive to create altertable scripts that convert varchar (1), varchar (2), varchar (3) into varchar 15. Also in this script I can choose the tables by common string