Python Program

- or -

Post a project like this

Ends in (days)

1036

Fixed Price

€30(approx. $35)

Posted: 3 years ago
Proposals: 16
Remote
#3888192
Awarded

+ have already sent a proposal.

Description

Experience Level: Entry

You are required to program in Python the following environment and algorithms:

Environment:
*k stochastic bandits
*Each bandit i has a reward that is uniformly distributed in [a_i, b_i].
*a_i, b_i should be chosen randomly in [0,1] and must be different for each arm i.
*Example, if a_i = 0.3 and b_i = 0.8 for arm i, then its reward at a given time can take any value in [0.3,0.8] with equal probability ("uniform") and its expected reward mu_i = 0.55

Algorithms:
*ε-Greedy: assume ε_t gets reduced according to the theorem in the slides.
*Upper Confidence Bound algorithm

Measurement Tasks:
* Produce plots that prove or disprove the respective sublinear regret rates for each scheme
*Compare the convergence/learning speed of the two algorithms for T = 1000, k = 10
*Repeat (2) for another two scenarios with different T,k values and comment on the differences similarities.

Hand in:
Python notebook of the code - this must execute correctly, also producing the respective plots above
it MUST be commented on in detail.

A short report (1-2 pages max) with measurement plots and brief comments for each
all plots should include axis titles, legends, etc., to be readable.

New Proposal

Clarification Board Ask a Question

There are no clarification messages.

Description

Christos A.

New Proposal

Clarification Board Ask a Question