Main idea
- Problem Statement
When training and test data are drawn from different distribution, performance of traditional object detection model drops significantly.
2. Proposed Solution
Domain Adaptive Faster R-CNN model can be used to effectively detect object from cross domain. Domain variation might be the result of difference in camera type, different weather condition, difference in appearance ,image quality, backgrounds etc. Following variations were considered here:
- Image level variation like image style, illumination etc
- Instance level variation like appearance,size etc
Resources
Datasets: Cityscapes,KITTI,SIM 10k ,etc
Technology
- Domain Adaptation
- Image level Adaptation
In the Faster R-CNN model, the image-level representation refers to the feature map outputs of the base convolutional layers (see the green parallelogram in Fig 1) Domain distribution mismatch on the image level was eliminated by employing a patch-based domain classifier as shown in the lower right part of Fig 1.
- Instance Level Adaptation
The instance-level representation refers to the ROI-based feature vectors before feeding into the final category classifiers (i.e., the rectangles after the “FC” layer in Fig 1). Aligning the instance level representations helps to reduce the local instance difference such as object appearance, size, viewpoint etc. A domain classifier for the feature vectors was trained to align the instance-level distribution.
2. Faster R-CNN
It is composed of three major components : shared bottom convolution layers,a region proposal network(RPN) and a region-of-interest (ROI) based classifier.
3. Final Network
Domain Adaptive Faster R-CNN was obtained by augmenting domain adaptation components to Faster R-CNN.
Findings
- When trained using synthetic data, (data is captured from video games, while the test data comes from the real world ) , the model achieved 7.7 % performance.Applying consistency regularization improves the gain by +8.8%
- When trained in condition of driving in adverse weather data, ( training data is taken in good weather conditions, while the test data in foggy weather), the model achieved +8.6 % gain
- For cross camera adaptation, (training data and test data are captured with different camera setups) , two different datasets KITTI and Cityscapes (thus resulting in variation in camera setup). The model performed well than other model as shown in Fig 2 given below. ( KITTI to Cityscapes is denoted as K -> C)