Hadoop Source code change in replication policy of hdfs

- or -

Post a project like this

Ended at: 31/07/2017

Fixed Price

$200

Posted: 7 years ago
Proposals: 2
Remote
#1626708
Expired

have already sent a proposal.

Description

Experience Level: Intermediate

General information for the business: hadoop replication policy
Kind of development: New program from scratch
Description of every module: reports from large Internet companies have indicated that 10% files has been hitted by 90% users, which means there is a strong skew about the popularity of files distribution. Unfortunately, distributed file systems or databases like HDFS or Hbase use the static and same replication parameter for every file or table. This results in some servers are hot with high bandwidth consumption and MapReduce tasks or Hbase query running on, while other servers are relatively cold, and thus hurt user's experience and data-center's utilization.

Goal: a dynamic, online approach to analyze which block in HDFS is hot, and where should be the new replication copy placed on.
Description of requirements/functionality: reports from large Internet companies have indicated that 10% files has been hitted by 90% users, which means there is a strong skew about the popularity of files distribution. Unfortunately, distributed file systems or databases like HDFS or Hbase use the static and same replication parameter for every file or table. This results in some servers are hot with high bandwidth consumption and MapReduce tasks or Hbase query running on, while other servers are relatively cold, and thus hurt user's experience and data-center's utilization.

Goal: a dynamic, online approach to analyze which block in HDFS is hot, and where should be the new replication copy placed on.
OS requirements: Linux
Extra notes:

New Proposal

Clarification Board Ask a Question

There are no clarification messages.

Description

Rajender K.

New Proposal

Clarification Board Ask a Question