[go: up one dir, main page]

Sun et al., 2020 - Google Patents

Zeroth-order supervised policy improvement

Sun et al., 2020

View PDF
Document ID
13084663342327670866
Author
Sun H
Xu Z
Song Y
Fang M
Xiong J
Dai B
Zhou B
Publication year
Publication venue
arXiv preprint arXiv:2006.06600

External Links

Snippet

Policy gradient (PG) algorithms have been widely used in reinforcement learning (RL). However, PG algorithms rely on exploiting the value function being learned with the first- order update locally, which results in limited sample efficiency. In this work, we propose an …
Continue reading at arxiv.org (PDF) (other versions)

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • G06N99/005Learning machines, i.e. computer in which a programme is changed according to experience gained by the machine itself during a complete run
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/12Computer systems based on biological models using genetic models
    • G06N3/126Genetic algorithms, i.e. information processing using digital simulations of the genetic system
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computer systems utilising knowledge based models
    • G06N5/04Inference methods or devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computer systems based on biological models
    • G06N3/02Computer systems based on biological models using neural network models
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06NCOMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computer systems based on specific mathematical models
    • G06N7/005Probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6217Design or setup of recognition systems and techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation, e.g. linear programming, "travelling salesman problem" or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6267Classification techniques
    • G06K9/6279Classification techniques relating to the number of classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/50Computer-aided design
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRICAL DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management, e.g. organising, planning, scheduling or allocating time, human or machine resources; Enterprise planning; Organisational models
    • G06Q10/063Operations research or analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/62Methods or arrangements for recognition using electronic means
    • G06K9/6296Graphical models, e.g. Bayesian networks

Similar Documents

Publication Publication Date Title
Mannor et al. Mean-variance optimization in Markov decision processes
Houthooft et al. Vime: Variational information maximizing exploration
US20210319362A1 (en) Incentive control for multi-agent systems
Ball et al. Ready policy one: World building through active learning
Sun et al. Zeroth-order supervised policy improvement
Henaff Explicit explore-exploit algorithms in continuous state spaces
Mutti et al. An intrinsically-motivated approach for learning highly exploring and fast mixing policies
Sinclair et al. Adaptive discretization in online reinforcement learning
Tang et al. Adaptive inference reinforcement learning for task offloading in vehicular edge computing systems
Eysenbach et al. Imitating past successes can be very suboptimal
Pavelski et al. ELMOEA/D-DE: Extreme learning surrogate models in multi-objective optimization based on decomposition and differential evolution
Lu et al. Hyper-parameter tuning under a budget constraint
Sallam et al. IMODEII: An Improved IMODE algorithm based on the Reinforcement Learning
Paulson et al. Efficient multi-step lookahead Bayesian optimization with local search constraints
Garmendia et al. Marco: A memory-augmented reinforcement framework for combinatorial optimization
Storvik et al. Lagrangian-based methods for finding MAP solutions for MRF models
Sun et al. Supervised Q-learning can be a strong baseline for continuous control
Smith et al. Strategic knowledge transfer
Khoi et al. Multi-objective exploration for proximal policy optimization
Sun et al. Supervised q-learning for continuous control
Gattami Reinforcement learning of Markov decision processes with peak constraints
Faury et al. Improving evolutionary strategies with generative neural networks
Tokmak et al. PACSBO: Probably approximately correct safe Bayesian optimization
Page et al. Repeated weighted boosting search for discrete or mixed search space and multiple-objective optimisation
Likmeta et al. Directed exploration via uncertainty-aware critics