Learning optimal control policies from data: a partially model-based actor-only approach

Ferrarotti, Laura (2022) Learning optimal control policies from data: a partially model-based actor-only approach. Advisor: Bemporad, Prof. Alberto. pp. 216. [IMT PhD Thesis]

Text (Doctoral thesis)
Ferrarotti_phdthesis.pdf - Published Version
Available under License Creative Commons Attribution Non-commercial Share Alike.
Download (3MB)

Abstract

This dissertation presents new algorithms for learning optimal feed-back controllers directly from experimental data, considering the plant to be controlled as a black-box source of streaming input and output data. The presented methods fall in the Reinforcement Learning “actor-only” family of algorithms, employing a represen-tation (policy parameterization) of the controller as a function of the feedback values and of a set of parameters to be tuned. The optimization of a policy parameterization corresponds to the search of the set of parameters associated with the best value of a chosen performance index. Such a search is carried on via numerical opti-mization techniques, such as the Stochastic Gradient Descent algo-rithm and related techniques. The proposed methods are based on a combination of the data-driven policy search framework with some elements of the model-based scenario, in order to mitigate some of the drawbacks presented by the purely data-driven approach, while retaining a low modeling effort, as compared to the typical identif-cation and model-based control design scenario. In particular, we initially introduce an algorithm for the search of smooth control policies, considering both the online scenario (when new data are collected from the plant during the iterative policy syn-thesis, while the plant is also under closed-loop control) and the of-fine one (i.e. from open-loop data that were previously collected from the plant). The proposed method is then extended to learn non-smooth control policies, in particular hybrid control laws, op-timizing both the local controllers and the switching law directly from data. The described methods are then extended in order to be employed in a collaborative learning setup, considering multi-agent systems characterized by heavy similarities, exploiting a cloud-aided scenario to enhance the learning process by sharing information.

Item Type:	IMT PhD Thesis
Subjects:	Q Science > QA Mathematics > QA75 Electronic computers. Computer science
PhD Course:	Computer science and systems engineering
Identification Number:	https://e-theses.imtlucca.it/352/
NBN Number:	urn:nbn:it:imtlucca-28310
Date Deposited:	17 Jun 2022 07:38
URI:	http://e-theses.imtlucca.it/id/eprint/352

Actions (login required, only for staff repository)

View Item