2020/6/1: I will present our project “Controlling Length in Image Captioning” at VQA workshop this year. The model used is a little bit behind the time because it was mostly done a year ago. Picked it up because it could fit in my thesis.

2020/3/27: Our paper “Detection and Description of Change in Visual Streams” is on arxiv now.

2020/3/22: My techinical report “A Better Variant of Self-Critical Sequence Training” is on arxiv now. It is a simple yet effective improvement upon SCST.

2020/1/24: Our paper Pixel Consensus Voting for Panoptic Segmentation is accepted by CVPR 2020. Arxiv link is here.

2019/11/10: Our paper “Context-Aware Zero-Shot Recognition” is accepted by AAAI 2020.

2019/10/27: Our PCV team wins Innovation Award on COCO panoptic segmentation track of COCO + Mapillary Challenge Workshop at ICCV 2019.

2019/10/23: I am collaborating with Hang Chu(Main contributor) on a podcast Daily Arxiv Radiostation. This podcast uses TTS to read the paper (title and abstracts) that algorithm picks, everyday. Welcome subscriptions. Chinese Introduction here.

2019/10/02: Our paper “Analysis of diversity-accuracy tradeoff in image captioning” will be presented at ICCV2019 CLVL workshop.

2019/08/01: Our high-resolution RGB-D dataset is released. More details can be found at DIODE. (unrealisticly accurate depth map and surface normal)

2019/06/14: I am invited to give a talk at Conceptual Captions Challenge Workshop at CVPR 2019 as the challenge winner team. The slides can be found here.

2019/04/24: Our paper “Context-Aware Zero-Shot Recognition” is on arxiv now. Code is available at [link].

2018/06/11: This year I am having an internship at Snap Research, working with Linjie Yang, Ning Zhang and Bohyung Han.

2018/02/20: Paper “Discriminability objective for training descriptive captions” is accepted to CVPR 2018. [link]

2017/06/12: I start my internship at Adobe Research, working with Scott Cohen and Brian Price. First time in Bay area.

2017/02/27: My recent work “Comprehension-guided referring expressions” is accepted by CVPR 2017. [link]

2015/12/14: I build a blog on github. Link to the blog.

2015/09/21: I start my new life in TTI-C and in Chicago.

2015/04/04: I have accepted the Ph.D. admission from Toyota Technological Institute at Chicago. See you in Chicago!


More Publications

Controlling Length in Image Captioning, Technical Report 2020.

PDF Code Slides Video

Detection and Description of Change in Visual Streams, arxiv 2020.


Pixel Consensus Voting for Panoptic Segmentation, CVPR 2020.

PDF Code Project Slides

Context-Aware Zero-Shot Recognition, AAAI 2020.

PDF Code Poster

DIODE: A Dense Indoor and Outdoor DEpth Dataset, arxiv preprint 2019.

PDF Dataset

Discriminability objective for training descriptive captions, CVPR 2018.

PDF Code Poster Slides supp

A Multi-task Learning Approach for Image Captioning, IJCAI 2018.


Comprehension-guided referring expressions, CVPR 2017.

PDF Poster



A Image captioning codebase in PyTorch.


A faster-rcnn implementation in PyTorch


Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning.


Mobilenet converted from tensorflow.


Convert resnet trained in caffe to pytorch model.



This is how to pronounce my name in Madarin (in the order of Last Name + First name) . People usually call me RT (The initials of my first name).

My Zhihu account.