Python Program
- or -
Post a project like this386
€30(approx. $32)
- Posted:
- Proposals: 17
- Remote
- #3888192
- Awarded
EXPERIENCED WRITER, Graphic Designer ,VIDEOGRAPHER, VIDEO AND PHOTO EDITOR , AND TEXT EDITOR.
Nairobi
766296084486669084292908408090810239079570906979490594648963481873983730152328360598
Description
Experience Level: Entry
You are required to program in Python the following environment and algorithms:
Environment:
*k stochastic bandits
*Each bandit i has a reward that is uniformly distributed in [a_i, b_i].
*a_i, b_i should be chosen randomly in [0,1] and must be different for each arm i.
*Example, if a_i = 0.3 and b_i = 0.8 for arm i, then its reward at a given time can take any value in [0.3,0.8] with equal probability ("uniform") and its expected reward mu_i = 0.55
Algorithms:
*ε-Greedy: assume ε_t gets reduced according to the theorem in the slides.
*Upper Confidence Bound algorithm
Measurement Tasks:
* Produce plots that prove or disprove the respective sublinear regret rates for each scheme
*Compare the convergence/learning speed of the two algorithms for T = 1000, k = 10
*Repeat (2) for another two scenarios with different T,k values and comment on the differences similarities.
Hand in:
Python notebook of the code - this must execute correctly, also producing the respective plots above
it MUST be commented on in detail.
A short report (1-2 pages max) with measurement plots and brief comments for each
all plots should include axis titles, legends, etc., to be readable.
Environment:
*k stochastic bandits
*Each bandit i has a reward that is uniformly distributed in [a_i, b_i].
*a_i, b_i should be chosen randomly in [0,1] and must be different for each arm i.
*Example, if a_i = 0.3 and b_i = 0.8 for arm i, then its reward at a given time can take any value in [0.3,0.8] with equal probability ("uniform") and its expected reward mu_i = 0.55
Algorithms:
*ε-Greedy: assume ε_t gets reduced according to the theorem in the slides.
*Upper Confidence Bound algorithm
Measurement Tasks:
* Produce plots that prove or disprove the respective sublinear regret rates for each scheme
*Compare the convergence/learning speed of the two algorithms for T = 1000, k = 10
*Repeat (2) for another two scenarios with different T,k values and comment on the differences similarities.
Hand in:
Python notebook of the code - this must execute correctly, also producing the respective plots above
it MUST be commented on in detail.
A short report (1-2 pages max) with measurement plots and brief comments for each
all plots should include axis titles, legends, etc., to be readable.
Christos A.
100% (5)Projects Completed
4
Freelancers worked with
4
Projects awarded
26%
Last project
27 Mar 2023
United Kingdom
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies