François-Lavet, V., Ernst, D., & Fonteneau, R. (2017).

*On overfitting and asymptotic bias in batch reinforcement learning with partial observability*. Eprint/Working paper retrieved from http://orbi.ulg.ac.be/handle/2268/214551.Peer reviewed

This paper stands in the context of reinforcement learning with partial observability and limited data. In this setting, we focus on the tradeoff between asymptotic bias (suboptimality with unlimited data) and ...

Olivier, F., Marulli, D., Ernst, D., & Fonteneau, R. (2017). Foreseeing New Control Challenges in Electricity Prosumer Communities.

*Proc. of the 10th Bulk Power Systems Dynamics and Control Symposium – IREP’2017*.Peer reviewed

This paper is dedicated to electricity prosumer communities, which are groups of people producing, sharing and consuming electricity locally. This paper focuses on building a rigorous mathematical framework in ...

Glavic, M., Fonteneau, R., & Ernst, D. (2017). Reinforcement Learning for Electric Power System Decision and Control: Past Considerations and Perspectives.

*The 20th World Congress of the International Federation of Automatic Control, Toulouse 9-14 July 2017*(pp. 1-10).Peer reviewed

In this paper, we review past (including very recent) research considerations in using reinforcement
learning (RL) to solve electric power system decision and control problems. The
RL considerations are ...

Olivier, F., Ernst, D., & Fonteneau, R. (2017). Automatic phase identification of smart meter measurement data.

*Proc. of CIRED 2017*.Peer reviewed

This paper highlights the importance of the knowledge of the phase identification for the different measurement points inside a low-voltage distribution network. Besides considering existing solutions, we ...

Castronovo, M., François-Lavet, V., Fonteneau, R., Ernst, D., & Couëtoux, A. (2017). Approximate Bayes Optimal Policy Search using Neural Networks.

*Proceedings of the 9th International Conference on Agents and Artificial Intelligence (ICAART 2017)*.Peer reviewed

Bayesian Reinforcement Learning (BRL) agents aim to maximise the expected collected rewards obtained when interacting with an unknown Markov Decision Process (MDP) while using some prior knowledge. State-of ...

Dubois, A., Wehenkel, A., Fonteneau, R., Olivier, F., & Ernst, D. (2017). An App-based Algorithmic Approach for Harvesting Local and Renewable Energy Using Electric Vehicles.

*Proceedings of the 9th International Conference on Agents and Artificial Intelligence (ICAART 2017)*.Peer reviewed

The emergence of electric vehicles (EVs), combined with the rise of renewable energy production capacities, will strongly impact the way electricity is produced, distributed and consumed in the very near ...

Fonteneau, R., & Ernst, D. (2017). On the Dynamics of the Deployment of Renewable Energy Production Capacities. In J. N., Furze, K., Swing, A. K., Gupta, R. H., McClatchey, & D. M., Reynolds,

*Mathematical Advances Towards Sustainable Environmental Systems*(pp. 43-60). Springer.Peer reviewed

This chapter falls within the context of modeling the deployment of renewable en-ergy production capacities in the scope of the energy transition. This problem is addressed from an energy point of view, i.e ...

François-Lavet, V., Taralla, D., Ernst, D., & Fonteneau, R. (2016). Deep Reinforcement Learning Solutions for Energy Microgrids Management.

*European Workshop on Reinforcement Learning (EWRL 2016)*.Peer reviewed

This paper addresses the problem of efficiently operating the storage devices in an electricity microgrid featuring photovoltaic (PV) panels with both short- and long-term storage capacities. The problem of ...

Castronovo, M., Ernst, D., Couëtoux, A., & Fonteneau, R. (2016, June). Benchmarking for Bayesian Reinforcement Learning.

*PLoS ONE*.Peer reviewed (verified by ORBi)

In the Bayesian Reinforcement Learning (BRL) setting, agents try to maximise the col- lected rewards while interacting with their environment while using some prior knowledge that is accessed beforehand. Many ...

Gemine, Q., Cornélusse, B., Glavic, M., Fonteneau, R., & Ernst, D. (2016). A Gaussian mixture approach to model stochastic processes in power systems.

*Proceedings of the 19th Power Systems Computation Conference (PSCC'16)*.Peer reviewed

Probabilistic methods are emerging for operating electrical networks, driven by the integration of renewable generation. We present an algorithm that models a stochastic process as a Markov process using a ...

François-Lavet, V., Gemine, Q., Ernst, D., & Fonteneau, R. (2016). Towards the Minimization of the Levelized Energy Costs of Microgrids using both Long-term and Short-term Storage Devices.

*Smart Grid: Networking, Data Management, and Business Models*(pp. 295-319). CRC Press.Peer reviewed

This chapter falls within the context of the optimization of the levelized energy cost (LEC) of microgrids featuring photovoltaic panels (PV) associated with both long-term (hydrogen) and short-term (batteries ...

Taralla, D., Qiu, Z., Sutera, A., Fonteneau, R., & Ernst, D. (2016). Decision Making from Confidence Measurement on the Reward Growth using Supervised Learning: A Study Intended for Large-Scale Video Games.

*Proceedings of the 8th International Conference on Agents and Artificial Intelligence (ICAART 2016) - Volume 2*(pp. 264-271).Peer reviewed

Video games have become more and more complex over the past decades. Today, players wander in visually and option- rich environments, and each choice they make, at any given time, can have a combinatorial ...

Aittahar, S., François-Lavet, V., Lodeweyckx, S., Ernst, D., & Fonteneau, R. (2015). Imitative Learning for Online Planning in Microgrids. In W. L., Woon, A., Zeyar, & M., Stuart (Eds.),

*Data Analytics for Renewable Energy Integration*(pp. 1-15). Springer.Peer reviewed

This paper aims to design an algorithm dedicated to operational planning for microgrids in the challenging case where the scenarios of production and consumption are not known in advance. Using expert ...

François-Lavet, V., Fonteneau, R., & Ernst, D. (2015). How to Discount Deep Reinforcement Learning: Towards New Dynamic Strategies.

*NIPS 2015 Workshop on Deep Reinforcement Learning*.Peer reviewed

Using deep neural nets as function approximator for reinforcement learning tasks have recently been shown to be very powerful for solving problems approaching real-world complexity. Using these results as a ...

Safadi, F., Fonteneau, R., & Ernst, D. (2015). Artificial Intelligence in Video Games: Towards a Unified Framework.

*International Journal of Computer Games Technology, 2015*, 30.Peer reviewed

With modern video games frequently featuring sophisticated and realistic environments, the need for smart and comprehensive agents that understand the various aspects of complex environments is pressing. Since ...

Fonteneau, R. (2014).

*From Bad Models to Good Policies: an Intertwined Story about Energy and Reinforcement Learning*. Paper presented at 2014 NIPS Workshop «From Bad Models to Good Policies Workshop (Sequential Decision Making under Uncertainty)», Montreal, December 12th, 2014 , Montreal, Canada.Batch mode reinforcement learning is a subclass of reinforcement learning for which the decision making problem has to be addressed without model, using trajectories only (no model, nor simulator nor ...

François-Lavet, V., Fonteneau, R., & Ernst, D. (2014, December). Using approximate dynamic programming for estimating the revenues of a hydrogen-based high-capacity storage device.

*IEEE Symposium Series on Computational Intelligence*.Peer reviewed

This paper proposes a methodology to estimate the maximum revenue that can be generated by a company that operates a high-capacity storage device to buy or sell electricity on the day-ahead electricity market ...

Fonteneau, R. (2014).

*Une Histoire d'Energie : Equations et Transition - Energy Stories, Equations and Transition*. Paper presented at Pecha Kucha Night, Liège, Belgium.Having access to abundant energy is a key component of our societies' lifestyle. The energy transition amounts in abolishing the dependency of our societies to finite and non-renewable energy resources. Is ...

Rivadeneira, P., Moog, C., Stan, G.-B., Brunet, C., Raffi, F., Ferré, V., Costanza, V., Mhawej, M.-J., Biafore, F., Ouattara, D., Ernst, D., Fonteneau, R., & Xia, X. (2014). Mathematical Modeling of HIV Dynamics After Antiretroviral Therapy Initiation: A Review.

*BioResearch Open Acces, 3*(5), 233-241.Peer reviewed (verified by ORBi)

This review shows the potential ground-breaking impact that mathematical tools may have in the analysis and the understanding of the HIV dynamics. In the first part, early diagnosis of immunological failure is ...

Moog, C., Rivadeneira, P., Stan, G.-B., Brunet, C., Raffi, F., Ferre, V., Costanza, V., Mhawej, M.-J., Ernst, D., Fonteneau, R., Biafore, F., Ouattara, D., & Xia, X. (2014). Mathematical modeling of HIV dynamics after antiretroviral therapy initiation: A clinical research study.

*AIDS Research and Human Retroviruses, 30*(9), 831-834.Peer reviewed (verified by ORBi)

Immunological failure is identified from the estimation of certain parameters of a mathematical model of the HIV infection dynamics. This identification is supported by clinical research results from an ...

Fonteneau, R. (2014).

*Energy Transition: How Can We Succeed?*Paper presented at Scientizenship and Energy - When Science and Society Meet around Energy Matters, Liège, Belgium.How can we optimize our chance to succeed in the energy transition? 70% to 80% of our energy consumption is still from nonrenewable sources. Switching to a model that would not depend on nonrenewable energy ...

Castronovo, M., Ernst, D., & Fonteneau, R. (2014). Bayes Adaptive Reinforcement Learning versus Off-line Prior-based Policy Search: an Empirical Comparison.

*Proceedings of the 23rd annual machine learning conference of Belgium and the Netherlands (BENELEARN 2014)*.Peer reviewed

This paper addresses the problem of decision making in unknown finite Markov decision processes (MDPs). The uncertainty about the MDPs is modeled using a prior distribution over a set of candidate MDPs. The ...

Castronovo, M., Ernst, D., & Fonteneau, R. (2014). Apprentissage par renforcement bayésien versus recherche directe de politique hors-ligne en utilisant une distribution a priori: comparaison empirique.

*Proceedings des 9èmes Journée Francophones de Planification, Décision et Apprentissage*.Peer reviewed

Cet article aborde le problème de prise de décision séquentielle dans des processus de déci- sion de Markov (MDPs) finis et inconnus. L’absence de connaissance sur le MDP est modélisée sous la forme d’une ...

François-Lavet, V., Fonteneau, R., & Ernst, D. (2014). Estimating the revenues of a hydrogen-based high-capacity storage device: methodology and results.

*Proceedings des 9èmes Journée Francophones de Planification, Décision et Apprentissage*.Peer reviewed

This paper proposes a methodology to estimate the maximum revenue that can be generated by a company that operates a high-capacity storage device to buy or sell electricity on the day-ahead electricity market ...

Fonteneau, R., Ernst, D., Boigelot, B., & Louveaux, Q. (2014). Lipschitz robust control from off-policy trajectories.

*Proceedings of the 53rd IEEE Conference on Decision and Control (IEEE CDC 2014)*.Peer reviewed

We study the minmax optimization problem introduced in [Fonteneau et al. (2011), ``Towards min max reinforcement learning'', Springer CCIS, vol. 129, pp. 61-77] for computing control policies for batch mode ...

Fonteneau, R., Murphy, S. A., Wehenkel, L., & Ernst, D. (2014). Apprentissage par renforcement batch fondé sur la reconstruction de trajectoires artificielles.

*Proceedings of the 9èmes Journées Francophones de Planification, Décision et Apprentissage (JFPDA 2014)*.Peer reviewed

Cet article se situe dans le cadre de l’apprentissage par renforcement en mode batch, dont le problème central est d’apprendre, à partir d’un ensemble de trajectoires, une politique de décision optimisant un ...

Fonteneau, R., & Prashanth L.A. (2014). Simultaneous perturbation algorithms for batch off-policy search.

*Proceedings of the 53rd IEEE Conference on Decision and Control (IEEE CDC 2014)*.Peer reviewed

We propose novel policy search algorithms in the context of off-policy, batch mode reinforcement learning (RL) with continuous state and action spaces. Given a batch collection of trajectories, we perform off ...

Fonteneau, R. (2013).

*Min Max Generalization for Deterministic Batch Mode Reinforcement Learning: Relaxation Schemes*. Paper presented at Dutch-Belgian Reinforcement Learning Workshop, Maastricht, The Netherlands.We study the min max optimization problem introduced in [Fonteneau et al. (2011), ``Towards min max reinforcement learning'', Springer CCIS, vol. 129, pp. 61-77] for computing policies for batch mode ...

Fonteneau, R., Busoniu, L., & Munos, R. (2013). Optimistic Planning for Belief-Augmented Markov Decision Processes.

*Proceedings 2013 Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-13), Singapore, 15–19 April 2013*.Peer reviewed

This paper presents the Bayesian Optimistic Planning (BOP) algorithm, a novel model-based Bayesian reinforcement learning approach. BOP extends the planning approach of the Optimistic Planning for Markov ...

Fonteneau, R., Busoniu, L., & Munos, R. (2013). Planification Optimiste dans les Processus Décisionnels de Markov avec Croyance.

*8èmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA'13)*.Peer reviewed

Cet article décrit l'algorithme BOP (de l'anglais ``Bayesian Optimistic Planning''), un nouvel algorithme d'apprentissage par renforcement Bayésien indirect (c'est à dire fondé sur un modèle). BOP étend l ...

Fonteneau, R., Ernst, D., Boigelot, B., & Louveaux, Q. (2013). Généralisation Min Max pour l'Apprentissage par Renforcement Batch et Déterministe : Relaxations pour le Cas Général T Etapes.

*8èmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA'13)*.Peer reviewed

Cet article aborde le problème de généralisation minmax dans le cadre de l'apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par [Fonteneau, 2011], et il a déjà ...

Fonteneau, R., Ernst, D., Boigelot, B., & Louveaux, Q. (2013). Min max generalization for deterministic batch mode reinforcement learning: relaxation schemes.

*SIAM Journal on Control & Optimization, 51*(5), 3355–3385.Peer reviewed (verified by ORBi)

We study the min max optimization problem introduced in Fonteneau et al. [Towards min max reinforcement learning, ICAART 2010, Springer, Heidelberg, 2011, pp. 61–77] for computing policies for batch mode ...

Fonteneau, R., Korda, N., & Munos, R. (2013). An Optimistic Posterior Sampling Strategy for Bayesian Reinforcement Learning.

*NIPS 2013 Workshop on Bayesian Optimization (BayesOpt2013)*.Peer reviewed

We consider the problem of decision making in the context of unknown Markov decision processes with finite state and action spaces. In a Bayesian reinforcement learning framework, we propose an optimistic ...

Fonteneau, R., Murphy, S. A., Wehenkel, L., & Ernst, D. (2013). Stratégies d'échantillonnage pour l'apprentissage par renforcement batch.

*Revue d'Intelligence Artificielle [=RIA], 27*(2), 171-194.Peer reviewed (verified by ORBi)

We propose two strategies for experiment selection in the context of batch mode reinforcement learning. The ﬁrst strategy is based on the idea that the most interesting experiments to carry out at some stage ...

Kedenburg, G., Fonteneau, R., & Munos, R. (2013). Aggregating Optimistic Planning Trees for Solving Markov Decision Processes.

*Advances in Neural Information Processing Systems 26 (2013)*(pp. 2382-2390).Peer reviewed

This paper addresses the problem of online planning in Markov decision processes using a randomized simulator, under a budget constraint. We propose a new algorithm which is based on the construction of a ...

Fonteneau, R. (2012).

*Batch Mode Reinforcement Learning based on the Synthesis of Artificial Trajectories*. Paper presented at Winter Meeting of the Canadian Mathematical Society, Montreal, Canada.Batch mode reinforcement learning (BMRL) is a field of research which focuses on the inference of high-performance control policies when the only information on the control problem is gathered in a set of ...

Maes, F., Fonteneau, R., Wehenkel, L., & Ernst, D. (2012). Policy search in a space of simple closed-form formulas: towards interpretability of reinforcement learning.

*Discovery Science 15th International Conference, DS 2012, Lyon, France, October 29-31, 2012. Proceedings*(pp. 37-51). Berlin, Germany: Springer.Peer reviewed

In this paper, we address the problem of computing interpretable solutions to reinforcement learning (RL) problems. To this end, we propose a search algorithm over a space of simple losed-form formulas that ...

Fonteneau, R., Ernst, D., Boigelot, B., & Louveaux, Q. (2012). Généralisation min max pour l'apprentissage par renforcement batch et déterministe : schémas de relaxation.

*Septièmes Journées Francophones de Planification, Décision et Apprentissage pour la conduite de systèmes (JFPDA 2012)*.Peer reviewed

On s’intéresse au problème de généralisation min max dans le cadre de l’apprentissage par renforcement batch et déterministe. Le problème a été originellement introduit par Fonteneau et al. (2011). Dans un ...

Castronovo, M., Maes, F., Fonteneau, R., & Ernst, D. (2012). Learning exploration/exploitation strategies for single trajectory reinforcement learning.

*Proceedings of the 10th European Workshop on Reinforcement Learning (EWRL 2012)*(pp. 1-9).Peer reviewed

We consider the problem of learning high-performance Exploration/Exploitation (E/E) strategies for finite Markov Decision Processes (MDPs) when the MDP to be controlled is supposed to be drawn from a known ...

Gemine, Q., Safadi, F., Fonteneau, R., & Ernst, D. (2012). Imitative Learning for Real-Time Strategy Games.

*Proceedings of the 2012 IEEE Conference on Computational Intelligence and Games*(pp. 424-429).Peer reviewed

Over the past decades, video games have become increasingly popular and complex. Virtual worlds have gone a long way since the first arcades and so have the artificial intelligence (AI) techniques used to ...

Fonteneau, R., Ernst, D., Boigelot, B., & Louveaux, Q. (2011). Relaxation schemes for min max generalization in deterministic batch mode reinforcement learning.

*4th International NIPS Workshop on Optimization for Machine Learning (OPT 2011)*.Peer reviewed

We study the min max optimization problem introduced in [Fonteneau, 2011] for computing policies for batch mode reinforcement learning in a deterministic setting. This problem is NP-hard. We focus on the two ...

Safadi, F., Fonteneau, R., & Ernst, D. (2011). Artificial intelligence design for real-time strategy games.

*NIPS Workshop on Decision Making with Multiple Imperfect Decision Makers*.Peer reviewed

For now over a decade, real-time strategy (RTS) games have been challenging intelligence, human and artificial (AI) alike, as one of the top genre in terms of overall complexity. RTS is a prime example problem ...

Fonteneau, R. (2011).

*Recent Advances in Batch Mode Reinforcement Learning: Synthesizing Artificial Trajectories*. Paper presented at Grascomp's Day, Bruxelles, Belgium.Batch mode reinforcement learning (BMRL) is a field of research which focuses on the inference of high-performance control policies when the only information on the control problem is gathered in a set of ...

Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2011). Active exploration by searching for experiments that falsify the computed control policy.

*Proceedings of the 2011 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-11)*.Peer reviewed

We propose a strategy for experiment selection - in the context of reinforcement learning - based on the idea that the most interesting experiments to carry out at some stage are those that are the most liable ...

Fonteneau, R. (2011).

*Contributions to Batch Mode Reinforcement Learning*. Unpublished doctoral thesis, Université de Liège, Liège, Belgium.This dissertation presents various research contributions published during these four years of PhD in the ﬁeld of batch mode reinforcement learning, which studies optimal control problems for which the only ...

Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2011). Towards min max generalization in reinforcement learning. In J., Filipe, A., Fred, & B., Sharp (Eds.),

*Agents and Artificial Intelligence: International Conference, ICAART 2010, Valencia, Spain, January 2010, Revised Selected Papers*(pp. 61-77). Springer.Peer reviewed

In this paper, we introduce a min max approach for addressing the generalization problem in Reinforcement Learning. The min max approach works by determining a sequence of actions that maximizes the worst ...

Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2010). Generating informative trajectories by using bounds on the return of control policies.

*Proceedings of the Workshop on Active Learning and Experimental Design 2010 (in conjunction with AISTATS 2010)*.Peer reviewed

We propose new methods for guiding the generation of informative trajectories when solving discrete-time optimal control problems. These methods exploit recently published results that provide ways for ...

Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2010). Model-free Monte Carlo-like policy evaluation.

*Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2010)*(pp. 217-224).Peer reviewed

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards along ...

Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2010). Model-free Monte Carlo–like policy evaluation.

*Proceedings of Conférence Francophone sur l'Apprentissage Automatique (CAp) 2010*.Peer reviewed

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards along ...

Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2010). A cautious approach to generalization in reinforcement learning.

*Proceedings of the 2nd International Conference on Agents and Artificial Intelligence*(pp. 10).Peer reviewed

In the context of a deterministic Lipschitz continuous environment over continuous state spaces, finite action spaces, and a finite optimization horizon, we propose an algorithm of polynomial complexity which ...

Fonteneau, R., & Ernst, D. (2010).

*Voronoi model learning for batch mode reinforcement learning*. University of Liège.We consider deterministic optimal control problems with continuous state spaces where the information on the system dynamics and the reward function is constrained to a set of system transitions. Each system ...

Fonteneau, R., Murphy, S. A., Wehenkel, L., & Ernst, D. (2010).

*Computing bounds for kernel-based policy evaluation in reinforcement learning*. University of Liège.This technical report proposes an approach for computing bounds on the finite-time return of a policy using kernel-based approximators from a sample of trajectories in a continuous state space and deterministic framework.

Mhawej, M.-J., Brunet-Francois, C., Fonteneau, R., Ernst, D., Ferré, V., Stan, G.-B., Raffi, F., & Moog, C. H. (2009). Apoptosis characterizes immunological failure of HIV infected patients.

*Control Engineering Practice, 17*(7), 798-804.Peer reviewed (verified by ORBi)

This paper studies the influence of apoptosis in the dynamics of the HIV infection. A new modeling of the healthy CD4+ T-cells activation-induced apoptosis is used. The parameters of this model are identified ...

Fonteneau, R., Murphy, S., Wehenkel, L., & Ernst, D. (2009). Inferring bounds on the performance of a control policy from a sample of trajectories.

*Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL-09)*(pp. 117-123).Peer reviewed

We propose an approach for inferring bounds on the finite-horizon return of a control policy from an off-policy sample of trajectories collecting state transitions, rewards, and control actions. In this paper ...

Stan, G.-B., Belmudes, F., Fonteneau, R., Zeggwagh, F., Lefebvre, M.-A., Michelet, C., & Ernst, D. (2008). Modelling the influence of activation-induced apoptosis of CD4+ and CD8+ T-cells on the immune system response of a HIV-infected patient.

*IET Systems Biology, 2*(2), 94-102.Peer reviewed (verified by ORBi)

On the basis of the human immunodeficiency virus (HIV) infection dynamics model proposed by Adams, the authors propose an extended model that aims at incorporating the influence of activation-induced apoptosis ...

Fonteneau, R., Wehenkel, L., & Ernst, D. (2008).

*Variable selection for dynamic treatment regimes: a reinforcement learning approach*. Paper presented at European Workshop on Reinforcement Learning 2008 (EWRL'08), Villeneuve d'Ascq, France.Peer reviewed

Dynamic treatment regimes (DTRs) can be inferred from data collected through some randomized clinical trials by using reinforcement learning algorithms. During these clinical trials, a large set of clinical ...