Sun et al., 2020 - Google Patents

Zeroth-order supervised policy improvement

Sun et al., 2020

Document ID: 13084663342327670866
Author: Sun H; Xu Z; Song Y; Fang M; Xiong J; Dai B; Zhou B
Publication year: 2020
Publication venue: arXiv preprint arXiv:2006.06600

External Links

Cited by

Snippet

Policy gradient (PG) algorithms have been widely used in reinforcement learning (RL). However, PG algorithms rely on exploiting the value function being learned with the first- order update locally, which results in limited sample efficiency. In this work, we propose an …

Continue reading at arxiv.org (PDF) (other versions)

238000005457 optimization 0 abstract description 23

Classifications

- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation, e.g. linear programming, "travelling salesman problem" or "cutting stock problem"
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
- G06Q10/063—Operations research or analysis
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6296—Graphical models, e.g. Bayesian networks

Similar Documents

Publication	Publication Date	Title
Mannor et al.	2011	Mean-variance optimization in Markov decision processes
Houthooft et al.	2016	Vime: Variational information maximizing exploration
US20210319362A1 (en)	2021-10-14	Incentive control for multi-agent systems
Ball et al.	2020	Ready policy one: World building through active learning
Sun et al.	2020	Zeroth-order supervised policy improvement
Henaff	2019	Explicit explore-exploit algorithms in continuous state spaces
Mutti et al.	2020	An intrinsically-motivated approach for learning highly exploring and fast mixing policies
Sinclair et al.	2023	Adaptive discretization in online reinforcement learning
Tang et al.	2020	Adaptive inference reinforcement learning for task offloading in vehicular edge computing systems
Eysenbach et al.	2022	Imitating past successes can be very suboptimal
Pavelski et al.	2014	ELMOEA/D-DE: Extreme learning surrogate models in multi-objective optimization based on decomposition and differential evolution
Lu et al.	2019	Hyper-parameter tuning under a budget constraint
Sallam et al.	2022	IMODEII: An Improved IMODE algorithm based on the Reinforcement Learning
Paulson et al.	2022	Efficient multi-step lookahead Bayesian optimization with local search constraints
Garmendia et al.	2024	Marco: A memory-augmented reinforcement framework for combinatorial optimization
Storvik et al.	2000	Lagrangian-based methods for finding MAP solutions for MRF models
Sun et al.	2022	Supervised Q-learning can be a strong baseline for continuous control
Smith et al.	2023	Strategic knowledge transfer
Khoi et al.	2021	Multi-objective exploration for proximal policy optimization
Sun et al.	2022	Supervised q-learning for continuous control
Gattami	2019	Reinforcement learning of Markov decision processes with peak constraints
Faury et al.	2019	Improving evolutionary strategies with generative neural networks
Tokmak et al.	2024	PACSBO: Probably approximately correct safe Bayesian optimization
Page et al.	2012	Repeated weighted boosting search for discrete or mixed search space and multiple-objective optimisation
Likmeta et al.	2022	Directed exploration via uncertainty-aware critics