Hadoop Source code change in replication policy of hdfs
- or -
Post a project like this$200
- Posted:
- Proposals: 2
- Remote
- #1626708
- Expired
Description
Experience Level: Intermediate
General information for the business: hadoop replication policy
Kind of development: New program from scratch
Description of every module: reports from large Internet companies have indicated that 10% files has been hitted by 90% users, which means there is a strong skew about the popularity of files distribution. Unfortunately, distributed file systems or databases like HDFS or Hbase use the static and same replication parameter for every file or table. This results in some servers are hot with high bandwidth consumption and MapReduce tasks or Hbase query running on, while other servers are relatively cold, and thus hurt user's experience and data-center's utilization.
Goal: a dynamic, online approach to analyze which block in HDFS is hot, and where should be the new replication copy placed on.
Description of requirements/functionality: reports from large Internet companies have indicated that 10% files has been hitted by 90% users, which means there is a strong skew about the popularity of files distribution. Unfortunately, distributed file systems or databases like HDFS or Hbase use the static and same replication parameter for every file or table. This results in some servers are hot with high bandwidth consumption and MapReduce tasks or Hbase query running on, while other servers are relatively cold, and thus hurt user's experience and data-center's utilization.
Goal: a dynamic, online approach to analyze which block in HDFS is hot, and where should be the new replication copy placed on.
OS requirements: Linux
Extra notes:
Kind of development: New program from scratch
Description of every module: reports from large Internet companies have indicated that 10% files has been hitted by 90% users, which means there is a strong skew about the popularity of files distribution. Unfortunately, distributed file systems or databases like HDFS or Hbase use the static and same replication parameter for every file or table. This results in some servers are hot with high bandwidth consumption and MapReduce tasks or Hbase query running on, while other servers are relatively cold, and thus hurt user's experience and data-center's utilization.
Goal: a dynamic, online approach to analyze which block in HDFS is hot, and where should be the new replication copy placed on.
Description of requirements/functionality: reports from large Internet companies have indicated that 10% files has been hitted by 90% users, which means there is a strong skew about the popularity of files distribution. Unfortunately, distributed file systems or databases like HDFS or Hbase use the static and same replication parameter for every file or table. This results in some servers are hot with high bandwidth consumption and MapReduce tasks or Hbase query running on, while other servers are relatively cold, and thus hurt user's experience and data-center's utilization.
Goal: a dynamic, online approach to analyze which block in HDFS is hot, and where should be the new replication copy placed on.
OS requirements: Linux
Extra notes:
Rajender K.
0% (0)Projects Completed
-
Freelancers worked with
-
Projects awarded
0%
Last project
6 May 2024
United States
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies