Brno Urban Dataset -- The New Data for Self-Driving Agents and Mapping Tasks

Autonomous driving is a dynamically growing field of research, where quality and amount of experimental data is critical. Although several rich datasets are available these days, the demands of researchers and technical possibilities are evolving. Through this paper, we bring a new dataset recorded in Brno, Czech Republic. It offers data from four WUXGA cameras, two 3D LiDARs, inertial measurement unit, infrared camera and especially differential RTK GNSS receiver with centimetre accuracy which, to the best knowledge of the authors, is not available from any other public dataset so far. In addition, all the data are precisely timestamped with sub-millisecond precision to allow wider range of applications. At the time of publishing of this paper, recordings of more than 350 km of rides in varying environment are shared at: https: //github.com/RoboticsBUT/Brno-Urban-Dataset.


I. INTRODUCTION
Research in the domain of autonomous mobile vehicles have tremendously expanded in the last few years [1], [2], [3]. From one of many possible applications of general mobile robotics [4] and a geeky interest of technical visionaries it became a large topic for both scientific and commercial sectors. Despite the undoubted motivation of financial bounties and pursuit of the emerging trends, this boom is also fueled with openly available data allowing more people to be part of it. To equip a car with state of the art sensors can easily become too expensive for small subjects such as start-ups or research groups on local universities. Sharing data allows much more researchers to participate in the progress of the field and enrich it with novel ideas, which, in the end, rewards everybody [5]. Second good reason for data sharing is a possibility to bypass the necessity of building and maintaining the sensory apparatus, which otherwise requires extra resources and engineering skills not related to the actual research topic of artificial intelligence. Having the opportunity to build our own data acquisition system and exceeding current state of the art in some of its parameters, we have decided to make the data publicly available.
Through this paper, we provide an urban dataset recorded in Brno -Czech Republic and its near surroundings. The The research was supported by ECSEL JU under the project H2020 737469 AutoDrive -Advancing fail-aware, fail-safe, and fail-operational electronic components, systems, and architectures for fully automated driving to make future mobility safer, affordable, and end-user acceptable. This research has been financially supported by the Ministry of Education, Youth and Sports of the Czech republic under the project CEITEC 2020 (LQ1601). 1   The detail of the Atlas sensory framework, which has been used to record all the data published in the Brno Urban Dataset. On the image there are four RGB cameras, single thermal (IR) camera, two Velodyne HDL-32E LiDARs, the RTK GNSS receiver with pair of differential antennas and the IMU unit in the center of the frame. location offers highly diverse environments -from highways to farm roads, from densely build-up areas to woods, all recorded in conditions of the real traffic. The sensory system contains all sensors which are considered standard these days [1], [6]: frontal stereo cameras, lateral cameras, two 3D laser scanners (LiDARs), inertial measurement unit (IMU) and global navigation satellite system (GNSS) receiver. Beyond that, there are three key features, which make the system unique. First, to our best knowledge, it is the only measurement system equipped with differential real-time kinematic (RTK) GNSS receiver providing centimetre accurate global positioning and heading at the same time. Second, a thermal camera is used to sense the scene in front of the car, which greatly enhances vulnerable road user awareness and general detection of various objects in bad weather conditions. Third, most of the system is exactly synchronized and timestamped using the GPS signal with sub-millisecond precision.
Usage of such data is wide. From practical point of view, having a reliable reference eliminates many problems in data processing. In general pose estimation, the benefits of differential GNSS are well understood [7], [8] and its application in autonomous road vehicles seems reasonable. Similarly, issues arising from bad synchronization of cameras are subject of dedicated research coming-up with creative ways of dealing with them [9], [10], [11], but our system allows to avoid such problems entirely, which is probably going to be the case in modern autonomous vehicles as well.
Another possible application of our data consists in development of the previously mentioned methods for dealing with uncertainty, because in some occasions their adoption is necessary. Exact timestamping allows evaluation of such methods, where timing of the data acquisition is part of the stochastic model [12], [13]. With precise global localization, we can benchmark the methods relying on local sensors such as cameras [10], [14], LiDARs [15], [16] and their combinations, frequently enhanced with inertial measurement units [17]. As we will show in the next section, the purpose of most of the other datasets is different and would be of limited usage in such experiments.

II. RELATED WORKS
Publicly available datasets are rapidly evolving according to the fast technological progress of the last two decades. Sensor accuracy and especially resolutions have grown substantially, which led to larger memory requirements for storing and higher bandwidth needs for sharing of the recordings. Today's possibilities fulfill all these demands, which makes older datasets exceeded quite fast. For example, going through the review in the Málaga dataset paper [18], we see that datasets captured five to ten year ago provide mostly 2D laser scans, camera resolution of a fraction of a megapixel and the largest had tens of kilometers of recorded path. Comparing them with more recent surveys [6] and [1], the standard have risen significantly. Unfortunately many large datasets are not freely available [19]. The following paragraphs cover only the state of the art represented by those opened to public interest.

A. Special Purpose Datasets
The recherche would be incomplete without a note on specialized datasets for distinct tasks in automated driving. Long solved is the traffic sign recognition problem [20] with dedicated datasets such as [21], [22]. Similar is the traffic light detection and interpretation [23]. The other datasets are more focused on mobile devices and augmented reality, therefore they do not cover the traffic from the perspective of the vehicle, but rather of a pedestrian as e.g. [14]. Another distinguishable group are the synthetic datasets such as SYNTHIA [24] or P.F.B. [25], which are specific due to precisely computed yet somewhat simplifying output data. The WildDash dataset [26], is focused on data, where the image segmentation algorithms fail. Contrary to its larger counterparts, this dataset is not meant for learning of the algorithms, but mainly for testing of their results.

B. Vision-focused datasets
Many datasets are designed to serve mostly for machine vision research, especially segmentation of real-life scenes and recognition of objects of interest. Large amount of very short recordings with a few manually annotated frames and crude GPS positioning is usually all what is available. The key contribution consists in large variety of scenes (often acquired though crowd sourcing) and reliable reference data. The CityScapes dataset [27], Mapillary Vistas dataset [28]

C. General mapping datasets
Arguably the most important group of datasets strongly focuses on sensory quality and variety, which allows large amount of applications in research. The vision subsystems are usually of higher quality and accompanied with laser scanners and inertial units. The most notable these days are the KITTI [30], the Málaga urban [18], the Oxford RobotCar [31] and the ApolloScape [32] datasets summarized in Tab. I.
The size of the datsets varies a lot, the KITTI and Málaga recordings cover only tens of kilometers, while the Oxford and the ApolloScape sets map 1000+ kilometers of roads. Treatment of timing and synchronization is unique in each case, the Oxford dataset timing has an excellent sub-millisecond precision, authors of the ApolloScape set claim just synchronization with no details explained and the KITTI and Málaga provide less precisely timed and mostly not synchronized data.

D. Summary
The previous overview covers many datasets relevant to autonomous driving, but only a few are of the kind, which we are presenting. Removing the vision only datasets [27], [28], [29] and [26], we are left with four projects providing similar data as we do. The KITTI [30], Málaga [18] and Oxford [31] datasets offer lower quality sensory data, while the ApolloScape [32] is clearly the richest dataset available in sense of camera resolution and point cloud density. On the other hand, none of these four references contain differential RTK GPS and, although synchronization and timestamping is mentioned everywhere, only the Oxford dataset presents detailed treatment of the topic using special software. We have achieved the same accuracy with more stable results due to hardware precautions. Although we offer only raw data, taking into account current state of the art, the Brno urban dataset has features reaching beyond that margin.

III. THE DATA ACQUISITION PLATFORM
The dataset has been recorded using an extensible sensory platform called ATLAS built in our laboratory. The system is composed of a communication network, data processing computer, synchronization unit and sensors themselves. The whole apparatus was designed with precision and modularity in mind, allowing wide range of experimental setups, while maintaining quality of the recordings. The following paragraphs will go through the current state of the ATLAS platform (see Fig. 1) and the Conclusion will summarize the upcoming extensions.

A. Data Gathering Infrastructure
Because we deal with large throughput of raw data, the central control and recording computer is built on a CPU with 64 PCIe lanes and a NMVe SSD disk. A Nvidia GTX 1080Ti graphic card is present as well for video processing and compression. The backbone of the ATLAS recording framework is an Ethernet network. It is based on IP communication protocol and all the sensors and the acquisition PC are interconnected via a high-speed switch with up to 18 Gb/s bandwidth for connection to the acquisition PC. The only exception is the IMU which is connected to the PC through a virtual serial link via USB interface.
It should be noted, that neither the recording PC, nor the Ethernet network exhibit real-time capabilities for precise and reliable timing and this functionality is solved independently as will be discussed in Sec. III-D.

B. Sensory Equipment
Sensors are the clearly a crucial part of the system. Table II summarizes all the devices currently installed and their most important parameters. In the next paragraphs, we will briefly elaborate on each sensor category to better present the data we provide.
As seen in Sec. II, RGB cameras are a must in autonomous vehicle applications. The ATLAS platform has two cameras installed for stereo vision in the front and two lateral cameras with wider field of view (FoV) for better coverage of the crossroads, walkways and other road users passing around. The frontal cameras are installed wide apart (∼70 cm) for better accuracy of distance estimation. With the full setup we cover more than 220 • of the car's surroundings.
The next important sensors are LiDARs. We employ two 3D Velodyne scanners mounted with a slight tilt around the forward-pointing axis of the car as can be seen from the photograph in Fig. 1. The reason is twofold: first, the scanners better cover the area sideways of the car and second, the rays on the opposite side can measure higher obstacles and do not needlessly scan a roof of the car.
Thermal camera usage is unique among existing datasets for automated driving. The device we employ senses infrared radiation in range of 7.5-13.5 μm, which corresponds to peak wavelengths emitted by objects in usual temperatures of -40 to +80 • C. The resolution is low in comparison to RGB cameras, but opacity of many objects (e.g. smoke, fog, thin foil) differs for infrared and visible light, so even with 640x512 pixels per frame, the information gain is substantial. The camera if mounted in the most important forwardlooking direction.
The fourth kind of sensor mounted on the ATLAS platform is an inertial measurement unit. Besides accelerometers and gyroscopes, the device contains a combined GNSS receiver (GPS, GLONASS, Galileo, BeiDou) and provides additional environmental measurements such as temperature, atmospheric pressure and magnetic field, which have limited usage in automated driving, but we have decided to publish them as well for completeness.
Last but not least, there is a separate combined GNSS receiver to obtain the most precise global localization available. The RTK functionality allows centimetre level accuracy. Additionally, the receiver allows connection of two antennas at the same time allowing to directly obtain a heading vector in global coordinates. This feature was not present in any dataset we know about and can be very valuable as a reference in map building and localization applications. Of course, the receiver provides diagnostics of a reliability of the measurements as well as precise time for other sensors. This feature has a key role in our solution of synchronization and timestamping described in Sec. III-D.

C. Calibration
So far, we have spoken about sensor poses within the ATLAS platform in a somewhat vague manner. The reason comes from difficulties with exact measurements. Each sensor has its own frame of coordinates, whose origin mostly lies within the device and is usually tied with the chassis by a few dimensions with certain tolerance. Although mutual position could be acquired with decent accuracy, the orientation measurement is very sensitive and even a small error could result in faulty alignment of data from multiple sources. Additionally, even if we could measure exactly, there are still manufacturing tolerances, which cannot be dealt with this way.
For this reason, we have decided to perform thorough calibration of sensors. Some of the methods used are suited for estimation of the intrinsic parameters of the device, while others are designed to obtain their mutual pose. Methods used, their settings, calibration data and the best estimates of the desired parameters are provided along the dataset on its website (https://github.com/RoboticsBUT/ Brno-Urban-Dataset-Calibrations). We expect the sensory equipment to change over time and update this material accordingly.

D. Synchronization and Timestamping
As already discussed in Sec. II, exact timing in data acquisition systems is, to some extent, replaceable by dedicated algorithms. We prefer to prevent the problems instead of fix them and the ATLAS platform was built with precise timing in mind. The scheme in Fig. 2 showing data flow in the system also plots the synchronization and time distribution lanes (dashed arrows).
The key source of precise time in our system is the GPS signal. Even the most basic receiver needs to maintain time precision in a nanosecond scale to provide usable positioning. Obviously, the Trimble RTK receiver has access to it, but the Velodyne laser scanners and Xsens IMU are equipped with small antennas as well, therefore precise time is available for them directly. For devices with no receiver of the GPS signal, we have designed a synchronization unit, which is clocked by a precise clock source from the Trimble receiver. It can either capture input trigger, pair it with an exact time and send a packet to the recording computer, or generate an output trigger signal with given frequency an send a timestamp corresponding to each firing. The unit is built around a simple microcontroller without operating system or nested interrupts, which allows to maintain transparent timing of its routines and guarantee an upper limit on timestampsignal mismatch. Taking into account propagation delays in hardware etc., the error is well below 1 ms. With all that precautions, we can completely bypass the system time of the control computer and stamp the data with timing from a trustworthy source.
Currently we use the synchronization unit for triggering of all RGB cameras with common signal. The thermal camera unfortunately can not be directly triggered by an external signal, neither provides an output trigger, but contains precise clock allowing additional corrections. Resulting list of timestamps is than paired with a timestamp sequence from the synchronization unit providing interpolated timing of each frame.
An obvious drawback of this approach is a strong dependence on the GPS signal availability. For this reason, all devices drawing the GPS time employ a graceful fallback to local clock source with known accuracy. In the worst case scenario the timing errors break the 1 ms limit after tens of minutes without GPS signal, which is enough to pass tunnels and other problematic locations and regain the exact time. We take a great care not to exceed this limit in any of our recordings.

IV. DATASET
As mentioned above, the dataset was recorded in Brno, Czech Republic. The ATLAS platform is not yet entirely waterproof, so the range of weather conditions captured is limited. On the other hand, thanks to mid-size of the town, the environmental diversity spans from natural, countrysidelooking locations to city center with historical buildings, public transportation and especially large amount of traffic and pedestrians.

A. Content
It is a good practice to sort the data according to its content. The time of recording serves mostly as a unique identifier and a brief description is good to get a quick overview of the recording, but both are cumbersome to use, if a whole database of all recordings is needed to be searched through. For this reason, we have employed a system of tags, which allow us to highlight the most important content and enable easy filtration of the recordings summarized in Tab. III. So far, we have made available 67 recordings of total length of 375.7 km and duration of more than 10 hours.

B. Data Structure
The structure of a single recording follows the scheme in Fig. 5. RGB and thermal camera data are distributed as H265 video and the LiDAR scans are compressed into .zip archives to reduce their size as much as possible. Other data such as timestamps and calibration files occupy a negligible amount of memory and are stored in human readable .txt and .yaml files.

C. Software and Development Tools
The recording session runs fully on the Robot Operation System (ROS). This allows us to create highly scalable solution which is compatible with many other projects using ROS backend as a base line for development of robotics applications. To satisfy wider audience and more comfortable usage, we publish the data in an easily readable raw format and provide a script for conversion into the ROS bag.
We also provide a set of several Python-OpenCV based scripts that helps to process the video data into separated frame files, or the drawing the trajectory into the Google Maps, etc. The software is available from https://github.com/RoboticsBUT/ Brno-Urban-Dataset-Tools, or as a submodule of the dataset git repository in the tools/ folder.

V. CONCLUSION AND FUTURE WORK
In this paper, we have presented a new dataset for autonomous driving research recorded in Brno -Czech Republic. We provide state of the art sensory measurements with three key additions exceeding other datasets available. First, most routes where GNSS signal was available are accompanied with centimeter accurate global position and heading from differential GNSS receiver. Second important feature is synchronization and timestamping of the data with sub-millisecond precision allowing simpler data fusion and evaluation of the algorithms, where temporal shifts in measurements are part of the stochastic model. The last addition is the infrared camera significantly increasing detection and recognition capabilities of the ATLAS measurement platform.
At the time of writing of this paper 67 recordings with total length of 375.7 km and duration of more than 10 hours are available. The recordings are tagged with respect to the environment, weather and other events encountered and provided in easily usable format through the project page: https://github.com/RoboticsBUT/ Brno-Urban-Dataset. Data collection is a long term process and we expect the dataset to grow over time with various new recording required by our research. We plan to densely cover a smaller region in Brno for map-building applications and to obtain more data acquired with problematic lightning and weather conditions or the winter sessions. We are also considering waterproofing of the ATLAS platform to cover full range of weather conditions encountered in middle Europe.