This thesis investigates the options of video object detection with key-point-based approaches. The problem of recognizing, locating, and... Show moreThis thesis investigates the options of video object detection with key-point-based approaches. The problem of recognizing, locating, and tracking objects in videos has been a challenging task in the computer vision area. There are few applications on key-point-based object detectors like CornerNet and CenterNet. At the first stage, this work involves the use of the previously proposed CenterNet module as a baseline detector on each frame of the Imagenet Video dataset. Then we apply an RNN module to exploit the temporal information from the past frames for better results.There are challenges in video object detection compared to still image-based object detection. It is not efficient to apply a still-image-based detector on each frame independently because we cannot exploit the temporal contextual information in videos since neighboring frames in a video are highly correlated. Object detection from videos suffers from motion blur, video focus, rare poses, etc. To overcome these issues one way of improving CenterNet for video object detection is to propagate the previous reliable detection results to boost the detection performance. Show less