1. General obstacle detection with video and radar systems:

 “Obstacle Detection” deals with obtaining a spatial representation of the vehicle environment that localises any kind of object (i.e. VRU, but also infrastructure and vehicles). In PROSPECT novel video- and radar-based environmental sensors are used. Combining the video- and radar-specific information streams will increase robustness and overall system performance and is the trigger for next generation active safety VRU protection features (i.e. extending the vehicle control from braking to steering) and paves the way towards fully automated driving in the near future. While advanced video methods are more and more based on deep learning with convolutional neural networks (CNN) the radar techniques still rely on classical approaches due to the very early development stage. To tackle these issues, the video component within PROSPECT has been developed in close collaboration by Daimler and the University of Amsterdam; the radar component by Bosch and Continental researchers.

For generic video-based obstacle detection, Daimler within the PROSPECT project uses the stixel world. Stixels are an efficient and sparse representation of objects having approximately vertical surfaces and thus can be used to represent e.g. VRU and cars. For example, see Fig. 2, where the different colours encode the class of each stixel.

Radar obstacle detection is performed by receiving transmitted electromagnetic waves that are reflected from objects within the sensor’s observation zone. It also utilises the Doppler method to measure directly the relative speed of the objects. An important step towards detection and classification of VRU with radar sensor systems in complex scenarios was to extend the field of view of the individual sensor or putting them in different locations in the vehicle and fusing the detection results in one source and also improving their resolution with integrated hardware components and fast increasing computation power.

Fig. 1: Semantic stixel representation where the image is segmented into drivable road, sky, and vertical “sticks”

2. Video and radar-specific VRU classification and tracking:

Object instance information is inferred by classical bounding box detection. For most box detection methods an accurate box proposal generation is crucial. Daimler within PROSPECT presented an efficient stereo proposal generation that meets both the runtime requirements and a high detection performance with the classification methods used.

On the other hand, the evaluation of the micro-Doppler signature and the exploitation of multiple input, multiple output (MIMO) systems, was the subject of the work performed by Bosch. The introduction of these two new measurement techniques in the last few years has led to significant improvements in automotive radar sensors. In particular, the new modulation scheme “Fast Chirp Sequence” that is capable of exploiting the micro-Doppler signals received from moving objects enables VRU classification and tracking by radar sensors. In the range-velocity representation the reflection of each traffic participant has its characteristic micro-Doppler signature that mainly depends on the individual movements of the radar-illuminated parts (see Fig.3 with µ-Doppler signatures of a cyclist in the range-Doppler plot).

In naturalistic observation studies and individual test campaigns conducted by the PROSPECT partners, the specific motion patterns of pedestrians and pedalling cyclists were recorded and analysed in depth.

Fig. 2: Obstacle detection with radar systems – generation of µ-Doppler signatures for feature analysis

3. Video- and radar-based VRU intent-related feature extraction:

The wish to know in advance where pedestrians or other traffic participants will be in the future is obvious and can be realised with intent-related feature extraction. Gait patterns and limb speed profiles are good indicators that can be evaluated by video and radar sensors. While the underlying physical principle for intent-related feature extraction is identical for video and radar sensors, the evaluation techniques used are completely different and adapted to the video- and radar-specific processing methods.

With Pose-RCNN Daimler has added an additional orientation estimation task to extract the object orientation jointly with the object detection based on a single CNN. Besides the object orientation Daimler extracted skeleton cues as a base for higher-level intention cues and/or human gestures (e.g. arm signal of a cyclist in Fig. 4).

Similar to video-based intent recognition, radar-based intent recognition is also possible by evaluating the same metrics such as gait speed variation with radar-specific means. 500 milliseconds before the pedestrian comes to a complete standstill a decrease of the gait speed can be observed. This information is useable to predict the intention of a pedestrian up to half a second in the future. AEB-VRU safety systems can thus trigger much earlier and mitigate or avoid VRU crashes in a better way.

Fig. 3: Video-based intention-related features which help to better predict the behaviour of the VRU

4. Video and radar-specific VRU Path Prediction and Intent Recognition:

Detailed and exact information on the location and moving directions of the VRUs around the target vehicle is needed to conduct appropriate warnings and control actions.

Daimler worked on a Dynamic Bayesian Network (DBN) to combine the various behavioural indicators extracted from the video system.

Radar-based path prediction and intent recognition is mainly based on the evaluation of a grid-based environmental model of static (i.e. fixed, roadside objects) and dynamic (i.e. the moving traffic participants) obstacles and the evaluation of characteristic motion patterns by micro-Doppler measurements.