Yandong Li  (李延东)

Ph.D. in Computer Science, University of Central Florida (UCF)
Center for Research in Computer Vision (CRCV)
lyndon.leeseu@outlook.com
Google Scholar

Education

  • 2017.08 - present
    Ph.D. in Computer Science, University of Central Florida (UCF), Orlando, USA
    advised by Dr. Boqing Gong in Tencent AI Lab and Prof. Liqiang Wang in UCF.
  • 2012.09 - 2016.07
    Bachelor in Computer Software Engineering, Southeast University (SEU), Nanjing, China
    GPA: 3.70/4.0  Rank: 3/139
    Awarded the Outstanding Undergraduate Student and National Scholarship

Awards


Research Experiences

My research interests are in the areas of machine learning and computer vision. Specifically, I have some practical experience on large scale video analysis and vision-language embedding area like visual question answering. Before I came to US, I have worked in Microsoft Research Asia and Baidu Institute of Deep Learning, our team won the 1st place for ActivityNet kinetics challenges and 3rd place for google cloud and Youtube-8m large scale video understanding challenges which are golden competitions in action recognition area. More recently, I explore the reinforcement learning for sequential model on video summarization.
  • University of Central Florida (UCF), Center for Research in Computer Vision, 2017.08 - present
    Reinforcing sequential DPP For Video Summarization, supervised by Dr. Boqing Gong
    • Reinforcing sequential DPP For Video Summarization
    • Visualization for action recognition models
  • Baidu Institute of Deep Learning, Genome Group, 2017.02 - 2017.8
    ActivityNet Kinetics and Youtube8m Challenge, Action Recognition, supervised by Dr. Chuang Gan
    • Our team won the 3rd place in the Youtube-8M and 1st place in the ActivityNet challengewhich are the golden competitions in this area.
  • Microsoft Research Asia, Multimedia Search and Mining Group, 2015.07 - 2016.07
    Research on cross-modality hashing and video thumbnail tools, mentored by Dr. Ting Yao and Dr. Tao Mei
    • Video Thumbnail Tools: Take part into the development of Microsoft Cognitive Service
    • Hashing: Deep cross-modal hashing won the outstanding dissertation for bachelor’s degree

Publications

  • How Local is the Local Diversity? Reinforcing Sequential Determinantal Point Processes with Dynamic Ground Sets for Supervised Video Summarization
    Yandong Li, Boqing Gong. To be submitted to ECCV, 2018.

  • TXN: temporal xception network for large scale video action recognition (CVPR 2018 Submission)

  • Multimodal Keyless Attention Fusion for Video Classification
    Xiang Long*, Chuang Gan, Gerard De melo, Xiao Liu, Yandong Li, Fu Li, Shilei Wen. (AAAI'18), New Orleans, USA Nov 2017
    We study various approachs to fuse multimodal RNN networks and where we can put a attention mechanism into the framework. As a result, we find out that multimodal keyless attention is the most successful at capturing multimodal semantic information.

  • Revisiting the Efectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification
    Yunlong Bian, Chuang Gan , Xiao Liu, Fu Li, Xiang Long, Yandong Li, Heng Qi, Jie Zhou, Shilei Wen, Yuanqing Lin. (CVPR'17 WORKSHOP), Hawaii, USA Aug 2017.
    Our team won the 1st place in the ActivityNet Kinetics Challenge. Our best single model achieves 77.7% in term of top-1 accuracy and 93.2% in term of top-5 accuracy on the validation set

  • Temporal Modeling Approaches for Large-scale Youtube-8M Video Understanding
    Fu Li, Chuang Gan, Xiao Liu, Yunlong Bian, Xiang Long, Yandong Li, Zhichao Li, Jie Zhou, Shilei Wen. (CVPR'17 WORKSHOP), Hawaii, USA Aug 2017.
    Our team won the 3rd place in the Google Cloud and YouTube-8M Video Understanding Challenge. Our best single model achieves 82.75% in term of GAP@20 on the Kaggle Public test set

  • VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation
    Chuang Gan, Yandong Li, Haoxiang Li, Chen Sun, Boqing Gong. (ICCV'17), Venice, Italy Oct 2017
    We present the preliminary work of linking the instance segmentations provided by COCO to the questions and answers (QAs) in the VQA dataset, and name the collected links as VQS. They transfer human supervision between the previously separate tasks, offer more effective leverage to existing problems, and also open the door for new research problems and models. We study two applications of the VQS data in thispaper: supervised attention for VQA and a novel question-focused semantic segmentation task.


Proudly powered by Bootstrap