KDD Cup 2017 赛题 |

Highway Tollgates Traffic Flow Prediction

Travel Time & Traffic Volume Prediction

Background

Highway tollgates are well known bottlenecks in traffic networks. During rush hours, long queues at tollgates can overwhelm traffic management authorities. Effective preemptive countermeasures are desired to solve this challenge. Such countermeasures include expediting the toll collection process and streamlining future traffic flow. The expedition of toll collection could be simply allocating temporary toll collectors to open more lanes. Future traffic flow could be streamlined by adaptively tweaking traffic signals at upstream intersections. Preemptive countermeasures will only work when the traffic management authorities receive reliable predictions for future traffic flow. For example, if heavy traffic in the next hour is predicted, then traffic regulators could immediately deploy additional toll collectors and/or divert traffic at upstream intersections.
Traffic flow patterns vary due to different stochastic factors, such as weather conditions, holidays, time of the day, etc. The prediction of future traffic flow and ETA (Estimated Time of Arrival) is a known challenge. An unprecedented large amount of traffic data from mobile apps such as Waze (in the US) or Amap (in China) can help us take up that challenge. If the contestants in this proposed KDD CUP could design reliable approaches for future traffic flow and ETA prediction, then the traffic management authorities might be able to capitalize on big data & algorithms for fewer congestions at tollgates.

众所周知，高速公路收费环节是交通网络中的瓶颈。在高峰时间，在收费站前排起的长队足以压垮交通管理系统。需要预先采取有效的对策来解决这个问题。比如加速收费过程和减少接下来的交通量。让收费站使用更多的车道可以简单的加块收费的过程。通过前一交叉口的信号可以控制未来的交通量。如果有对未来交通流的可靠预测，交通管理系统才能使用先制策略。例如预测到下一小时会有交通高峰，交通管理系统可以部署额外的收费车道以及/或者在前一个交叉路口实行分流。

交通流由于不同的随机因素而有各种模式，如天气、假日、时间段等。未来交通流和ETA（估计到达时间）的预测是已知的难题。来自Waze（美国）或Amap（中国）等移动应用的前所未有的大量流量数据也许可以帮助我们挑战这一难题。如果KDD CUP中的参赛者可以设计用于未来交通流和ETA预测的可靠方法，则交通管理系统可能能够利用大数据和算法来减少在收费站的拥塞。

Tasks

Available datasets are: the road network topology in the target area (Figures 1, 3, and 4, Tables 3 and 4), vehicle trajectories (Table 5), historical traffic volume at tollgates (Table 6), and weather data (Table 7). The contest consists of two tasks with the details below.

可用的数据集是：目标区域中的道路网络拓扑（图1,3和4，表3和4）、车辆轨迹（表5）、所有收费站历史交通量（表6）和天气数据（表7 ）。比赛包括两个任务，详情如下。

Task 1: To estimate the average travel time from designated intersections to tollgates

For every 20-minute time window, please estimate the average travel time of vehicles for a specific route (shown in Figure 1).**

a. Routes from Intersection A to Tollgates 2 & 3;
b. Routes from Intersection B to Tollgates 1 & 3;
c. Routes from Intersection C to Tollages 1 & 3.
Note: the ETA of a 20-minute time window for a given route is the average travel time of all vehicle trajectories that enter the route in that time window. Each 20-minute time window is defined as a right half-open interval, e.g., [2016-09-18 23:40:00, 2016-09-19 00:00:00).

Submission Format (see Table 1)
The data types used in all tables in this document are int, float, string, date and datetime. The date and datetime comply with the formats “yyyy-MM-dd” and “yyyy-MM-dd HH:mm:ss”. The time_window field consists of two datetime types separated by a comma without any blank, e.g., “2016-09-18 08:40:00,2016-09-18 09:00:00”.

Field	Type	Description
intersection_id	string	intersection ID
tollgate_id	string	tollgate ID
time_window	string	e.g., [2016-09-18 08:40:00,2016-09-18 09:00:00)
avg_travel_time	float	average travel time (seconds)

Table 1. Travel Time from Intersections to Tollgates

Task 2: To predict average tollgate traffic volume

For every 20-minute time window, please predict the entry and exit traffic volumes at tollgates 1, 2 and 3 (Figures 1 and 2). Note that tollgate 2 only allows traffic entering the highway while others allow traffic both ways (entry and exit). Therefore, we need to predict the volume for 5 tollgate-direction pairs in total.

Submission Format (see Table 2)

Field	Type	Description
tollgate_id	string	tollgate ID
time_window	string	e.g., [2016-09-18 08:40:00,2016-09-18 09:00:00)
direction	string	0: entry, 1: exit
volume	int	total volume

Table 2. Traffic Volume at Tollgates

Training & Testing Datasets:

At the beginning of the contest, traffic predictions for specific rush hours from Oct. 18th to Oct. 24th are to be made by the contestants. On May 25 there will be a data swap, after which the participants need to predict traffic during rush hours from Oct. 25th to Oct. 31st.

在比赛开始时，参赛者将对10月18日至10月24日的特定高峰时间的交通情况进行预测。 5月25日将进行数据更替，之后参与者需要预测10月25日至10月31日高峰时段的交通情况。

Contestants are to predict the ensuing traffic during the red time slots shown in Figure 2, i.e., 08:00 - 10:00 and 17:00 - 19:00, at 20-minute intervals.

Figure 2. Time Windows for Traffic Prediction

For travel time prediction, the initial training set contains data gathered from July. 19th to Oct. 17th. For volume prediction, the initial training set contains data gathered from Sep. 19th to Oct. 17th. After the data swap on May 25, additional training data from Oct. 18th to Oct. 24th will be added for both prediction tasks.
In the testing datasets, contestants are provided with traffic data during the green time slots shown in Figure 2, i.e., 06:00 - 08:00 and 15:00 - 17:00. Contestants can use that information as a leading indicator of traffic in the next two hours, which is to be predicted.
Note: Contestants are not restricted to use only the previous 2-hour data in prediction. However, each prediction is restricted to use only the traffic data before the predicted time window. For example, contestants are NOT allowed use the traffic data from Oct. 20th to predict the traffic on Oct. 19th.

Evaluation Metrics

We choose Mean Absolute Percentage Error (MAPE) to evaluate the result.
Task 1: Let drt and prt be the actual and predicted average travel time for route r during time window t. The MAPE for travel time prediction is defined as:

R and T are the number of routes and number of to-predict time windows in the testing period respectively.

Task 2: Let C be the number of tollgate-direction pairs (as aforementioned: 1-entry, 1-exit, 2-entry, 3-entry and 3-exit), T be the number of time windows in the testing period, and fct and pct be the actual and predicted traffic volume for a specific tollgate-direction pair c during time window t. The MAPE for traffic volume prediction is defined as:
$$
MAPE=\frac{1}{c} \sum_{c=1}^c\left(\frac{1}{T} \sum_{t=1}^T\Biggm\lvert\frac{f_{ct}-p_{ct}}{f_{ct}} \Biggm\rvert\right)
$$

Data Description

The road network (Figure 1) here used is a directed graph formed by interconnected road links (Figure 3). A route (Figure 4) in the network is represented by a sequence of links. For every road link, its vehicle traffic comes from one or more “incoming road links” and goes into one or more “outgoing road links”. Table 3 and Figure 3 describe road links.

Vehicles traveling from road intersections to highway tollgates have limited route options. For each intersection-tollgate pair, we selected only the most important one into Table 4. For example, Figure 4 illustrates the route with 9 consecutive road links from Intersection B to tollgate 1.

Table 5 introduces the time-stamped records of actual vehicles along the routes from road intersections to highway tollgates.