Sun et al., 2020 - Google Patents
Zeroth-order supervised policy improvementSun et al., 2020
View PDF- Document ID
- 13084663342327670866
- Author
- Sun H
- Xu Z
- Song Y
- Fang M
- Xiong J
- Dai B
- Zhou B
- Publication year
- Publication venue
- arXiv preprint arXiv:2006.06600
External Links
Snippet
Policy gradient (PG) algorithms have been widely used in reinforcement learning (RL). However, PG algorithms rely on exploiting the value function being learned with the first- order update locally, which results in limited sample efficiency. In this work, we propose an …
- 238000005457 optimization 0 abstract description 23
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N99/00—Subject matter not provided for in other groups of this subclass
- G06N99/005—Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/12—Computer systems based on biological models using genetic models
- G06N3/126—Genetic algorithms, i.e. information processing using digital simulations of the genetic system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computer systems utilising knowledge based models
- G06N5/04—Inference methods or devices
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computer systems based on biological models
- G06N3/02—Computer systems based on biological models using neural network models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06N—COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computer systems based on specific mathematical models
- G06N7/005—Probabilistic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6217—Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/04—Forecasting or optimisation, e.g. linear programming, "travelling salesman problem" or "cutting stock problem"
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6267—Classification techniques
- G06K9/6279—Classification techniques relating to the number of classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/50—Computer-aided design
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06Q—DATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
- G06Q10/063—Operations research or analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06K—RECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
- G06K9/00—Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
- G06K9/62—Methods or arrangements for recognition using electronic means
- G06K9/6296—Graphical models, e.g. Bayesian networks
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mannor et al. | Mean-variance optimization in Markov decision processes | |
Houthooft et al. | Vime: Variational information maximizing exploration | |
US20210319362A1 (en) | Incentive control for multi-agent systems | |
Ball et al. | Ready policy one: World building through active learning | |
Sun et al. | Zeroth-order supervised policy improvement | |
Henaff | Explicit explore-exploit algorithms in continuous state spaces | |
Mutti et al. | An intrinsically-motivated approach for learning highly exploring and fast mixing policies | |
Sinclair et al. | Adaptive discretization in online reinforcement learning | |
Tang et al. | Adaptive inference reinforcement learning for task offloading in vehicular edge computing systems | |
Eysenbach et al. | Imitating past successes can be very suboptimal | |
Pavelski et al. | ELMOEA/D-DE: Extreme learning surrogate models in multi-objective optimization based on decomposition and differential evolution | |
Lu et al. | Hyper-parameter tuning under a budget constraint | |
Sallam et al. | IMODEII: An Improved IMODE algorithm based on the Reinforcement Learning | |
Paulson et al. | Efficient multi-step lookahead Bayesian optimization with local search constraints | |
Garmendia et al. | Marco: A memory-augmented reinforcement framework for combinatorial optimization | |
Storvik et al. | Lagrangian-based methods for finding MAP solutions for MRF models | |
Sun et al. | Supervised Q-learning can be a strong baseline for continuous control | |
Smith et al. | Strategic knowledge transfer | |
Khoi et al. | Multi-objective exploration for proximal policy optimization | |
Sun et al. | Supervised q-learning for continuous control | |
Gattami | Reinforcement learning of Markov decision processes with peak constraints | |
Faury et al. | Improving evolutionary strategies with generative neural networks | |
Tokmak et al. | PACSBO: Probably approximately correct safe Bayesian optimization | |
Page et al. | Repeated weighted boosting search for discrete or mixed search space and multiple-objective optimisation | |
Likmeta et al. | Directed exploration via uncertainty-aware critics |