Transfer / meta / lifelong learning

- RL with policy advice. Azar et al., ECML 2013.

PublishedJan 22, 2026

Loading actions...

5 minBeginnerpromptSingle file

Skill content

Main instructions and any bundled files for this skill.

markdown

Transfer / meta / lifelong learning

RL with policy advice. Azar et al., ECML 2013.
```
  - Reduction from RL to bandit problem.
```
Regret bounds: sum of differences between actual policy and optimal policy.
Regret scales with the number of tasks \sqrt(M), rather than the state and action space.
Brunskill and Li, UAI 2013. Reduce from RL to (active) classification problem.
https://cs.stanford.edu/people/ebrun
Provably speeding multitask RL. Guo and Brunskill, AAAI 2015. K tasks sampled from M tasks. Evaluation goal: provably improve performance. Approach: quickly cluster, then share.
Killian et al., NIPS 2017. Bayesian NNs for modeling MDP dynamics.
Smooth latent policy space for crossdomain transfer. Anmar et al., IJCAI 2015. Limited theoretical results (some nice convergence results).
Model-agnostic meta-learning. Finn et al., ICML 2017.

View Original Source

Related Skills

Education

PromptBeginner5 minmarkdown

Spoken English Teacher and Improver

I want you to act as a spoken English teacher and improver. I will speak to you in English and you will reply to me in English to practice my spoken English. I want you to keep your reply neat, limiti...

Jan 15, 2026

Education

PromptBeginner5 minmarkdown

Student Tier

Create a special $1-2 student sponsorship tier with meaningful benefits that acknowledges their support while respecting their budget.

Feb 4, 2026

Education

PromptBeginner5 minmarkdown

Time Commitment

Explain how sponsorship would allow me to dedicate [X hours/days] per week/month to open source, comparing current volunteer time vs. potential sponsored time.

Feb 7, 2026