UNIVERSITY OF CALIFORNIA TRANSPORTATION CENTER

COVER SHEET FOR FINAL REPORT

1. Project Title

Reinforcement Learning in Transportation Infrastructure Management

2. PI

Samer Madanat

Professor of Civil and Environmental Engineering

114 McLaughlin Hall, #1720

University of California, Berkeley

Berkeley, CA 94720-1720

Tel: (510) 643-1084

Fax: (510) 642-1246

E-mail: madanat@ce.berkeley.edu

3. Funded by a UCTC Year 14 Research Grant


Reinforcement Learning in Transportation Infrastructure Management

Final Report

Submitted to UCTC

Samer Madanat

Department of Civil and Environmental Engineering

University of California, Berkeley

Infrastructure Management Systems support agencies in developing efficient policies to monitor, maintain and repair deteriorating facilities in transportation infrastructure networks.   Traditionally, Infrastructure Management Systems have been based on a time-invariant characterization of a facility’s deterioration process.  However, a constant single model of a facility’s deterioration may not be appropriate given the variability over time of causal factors such as traffic and environmental conditions.  When this variability over time is accounted for, the infrastructure management problem becomes a Reinforcement Learning problem.

One possible approach for solving this Reinforcement Learning problem would be to represent facility deterioration process using a time-varying stochastic model.  The problem of finding optimal policies to manage infrastructure facilities and networks can then be formulated as an adaptive control problem, where observations of facility condition over time can be used to update the parameters of the models.  An alternative to this approach is to use temporal difference learning.  This approach allows us to develop policies without having to model a facility’s deterioration process.  Instead, the information that is gathered by the transportation agency is used to evaluate maintenance and repair policies directly, without using a stochastic process to represent facility deterioration.  Thus, we capture performance model uncertainty by including beliefs about deterioration in the set of information that is used to make Maintenance, Repair, and Reconstruction (MR & R) decisions.  The beliefs correspond to an agency’s assessment about which model can be used to represent the physical deterioration process. 

Adaptive optimization models use observations of condition, obtained during the management of facilities, to update an agency’s beliefs. Over time this results in an adequate representation of the physical deterioration process. Adaptive optimization models for MR & R decision-making are introduced in Durango and Madanat (2002).  In this research, we presented an extension that jointly optimizes MR & R and inspection decisions for the facility-level optimization problem.  The proposed formulation combines the Latent Markov Decision Process (LMDP) with the adaptive control formulation.  The adaptive optimization model in this research, therefore, is to find joint inspection and maintenance policies for infrastructure facilities under performance model uncertainty.  The objective in the formulation is to minimize the total expected social cost of managing facilities over a finite planning horizon.  Decision-making involves the choice of action to perform during a period, as well as whether to inspect or not at the beginning of the next period.  Inspections are assumed to reveal information about the current condition of a facility.

A computational study was performed in the context of pavement management with a planning horizon of 15 years and a 5% of discount rate.  As in Canahan et al. (1987), we assumed that pavement condition is represented by 8 states.  The agency can choose from the following MR & R actions: (1) do-nothing, (2) routine maintenance, (3) 1-in overlay, (4) 2-in overlay, (5) 4-in overlay, (6) 6-in overlay, and (7) reconstruction.  Three possible deterioration models were considered: (1) slow, (2) medium, and (3) fast.  Each model was characterized by a set of 7 of transition probability matrices (one for each action).  The user costs were taken from Durango and Madanat (2002) and the inspection cost was $0.065/lane-yard and was taken from Madanat and Ben-Akiva (1994).  We ignored measurement error in this study.  The models were taken from Durango and Madanat (2002) and are such that

-          The effect of MR & R actions in transition was assumed to follow a truncated normal distribution with the mean depending on the action and the model and the variance depending on the model;

-          Actions were less effective in improving pavement condition under faster deterioration models; and

-          Faster deterioration models had higher variance in forecasting.

As expected, when the initial beliefs are close to the physical process, the expected costs was the lowest.  A noteworthy result is that the non-informative initial beliefs were the worst in all cases.  This result seems to indicate that inaccurate, but precise, beliefs about the deterioration model are preferred to less biased beliefs of higher variance.  To understand this strange result, we conducted a simulation study.  The simulation study showed that the higher expected cost when the initial beliefs about deterioration have a higher variance attached to them.  In other words, reducing the initial variance in model uncertainty is more important than reducing the initial bias.  This means that providing the wrong information is less costly than providing no information about deterioration.  The reason for this strange result is that the beliefs about deterioration can be adjusted drastically and quickly in response to unexpected events.

We relaxed the constraint of annual inspections and performed a computational study.  It was shown that this relaxation led to a reduction in expected costs.  This is due to the fact that an inspection is performed only when it provides information that will improve future decisions.  As a result, the expected number of inspections is reduced.  A remarkable feature from this computational study is that when the variance in the beliefs about the deterioration is low, the larger reduction in expected cost is observed when the initial beliefs are adequate.  This indicates that the benefit of a flexible inspection schedule is greater when inspections provide less information, which is an intuitive result. 

The scope of this research was purposely limited to the facility-level of the MR & R problem.  An immediate extension is to adapt the formulation to the network-level problem with administrative restriction.  A possible approach to incorporate network level constraints is to formulate the model developed herein using randomized policies and to solve it using linear programming.


REFERENCES

Papers to Date:

  1. Durango P. and Madanat S., “Optimal Maintenance and Repair Policies for Infrastructure Facilities under Uncertain Deterioration Rates: An Adaptive Control Approach”, Transportation Research, Part A, Vol. 36, No. 9, Elsevier Science, 2002.
  2. Guillaumot V., Durango P. and Madanat S., “Adaptive Optimization of Infrastructure Maintenance and Inspection Policies under Performance Model Uncertainty”, accepted for publication in the ASCE Journal of Infrastructure Systems.
  3. Durango P., “Reinforcement Learning in Infrastructure Management”, Proceedings of the Conference on Application of Advanced Technologies to Transportation, ASCE.

Conferences Attended:

  1. Second International Symposium on Transportation Infrastructure Management, Berkeley, CA, October 2001.
  2. Transportation Research Board Annual meeting, Washington, DC, January 2002
  3. Application of Advanced Technologies to Transportation, Boston, MA, August 2002

Other Accomplishments:

One graduate student funded by this project, Pablo Durango, has completed his dissertation in August 2002, and is currently the Louis Berger Assistant Professor of Transportation, department of Civil and Environmental Engineering at Northwestern University.