


default search action
Dongxu Li 0003
Person information
- affiliation: DATA61-CSIRO, Australia
- affiliation: Australian National University (ANU), College of Engineering and Computer Science, Australia
- affiliation: Salesforce AI Research, Palo Alto, CA, USA
Other persons with the same name
- Dongxu Li (aka: Dong-Xu Li) — disambiguation page
- Dongxu Li 0001
— University of Electronic Science and Technology of China (UESTC), Shenzhen Institute for Advanced Study, China (and 1 more) - Dongxu Li 0002
(aka: Dong-Xu Li 0002) — Chinese Academy of Sciences (CAS), Xinjiang Technical Institute of Physics and Chemistry, China (and 1 more) - Dongxu Li 0004
— Peking University, Center for MRl Research, Beijing, China (and 1 more) - Dongxu Li 0005
— Beihang University, School of Electronics and Information Engineering, Beijing, China - Dongxu Li 0006
— Nanjing University of Information Science and Technology, School of Remote Sensing and Geomatics Engineering, China - Dongxu Li 0007 — R&D, General Motors Company Limited, GM Research and Development Center, Warren, MI, USA (and 1 more)
- Dongxu Li 0008
— Huaqiao University, College of Materials Science and Engineering, Xiamen, China - Dongxu Li 0009
— Guangxi University of Science and Technology, School of Automation, Guangxi, China - Dongxu Li 0010
— East China Normal University (ECNU), Software Engineering Institute, Shanghai, China
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c26]Yan Yang, Dongxu Li, Haoning Wu, Bei Chen, Liu Liu, Liyuan Pan, Junnan Li:
ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks. ACL (Findings) 2025: 10883-10892
[c25]Yuhao Yang, Yue Wang, Dongxu Li, Ziyang Luo, Bei Chen, Chao Huang, Junnan Li:
Aria-UI: Visual Grounding for GUI Instructions. ACL (Findings) 2025: 22418-22433
[c24]Yan Yang, Liyuan Pan, Dongxu Li, Liu Liu:
EZSR: Event-based Zero-Shot Recognition. CVPR 2025: 4628-4638
[c23]Ziyang Luo, Haoning Wu, Dongxu Li, Jing Ma, Mohan S. Kankanhalli, Junnan Li:
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation. CVPR 2025: 8461-8474
[i27]Yan Yang, Dongxu Li, Haoning Wu, Bei Chen, Liu Liu, Liyuan Pan, Junnan Li:
ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks. CoRR abs/2503.06885 (2025)
[i26]Yan Yang, Dongxu Li, Yutong Dai, Yuhao Yang, Ziyang Luo, Zirui Zhao, Zhiyuan Hu, Junzhe Huang, Amrita Saha, Zeyuan Chen, Ran Xu, Liyuan Pan, Caiming Xiong, Junnan Li:
GTA1: GUI Test-time Scaling Agent. CoRR abs/2507.05791 (2025)- 2024
[c22]Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, Ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles:
X-InstructBLIP: A Framework for Aligning Image, 3D, Audio, Video to LLMs and its Emergent Cross-Modal Reasoning. ECCV (45) 2024: 177-197
[c21]Haoning Wu, Dongxu Li, Bei Chen, Junnan Li:
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding. NeurIPS 2024
[i25]Haoning Wu, Dongxu Li, Bei Chen, Junnan Li:
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding. CoRR abs/2407.15754 (2024)
[i24]Yan Yang, Liyuan Pan, Dongxu Li, Liu Liu:
EZSR: Event-based Zero-Shot Recognition. CoRR abs/2407.21616 (2024)
[i23]Dongxu Li, Yudong Liu, Haoning Wu, Yue Wang, Zhiqi Shen, Bowen Qu, Xinyao Niu, Guoyin Wang, Bei Chen, Junnan Li:
Aria: An Open Multimodal Native Mixture-of-Experts Model. CoRR abs/2410.05993 (2024)
[i22]Ziyang Luo, Haoning Wu, Dongxu Li, Jing Ma, Mohan S. Kankanhalli, Junnan Li:
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation. CoRR abs/2411.13281 (2024)
[i21]Yuhao Yang, Yue Wang, Dongxu Li, Ziyang Luo, Bei Chen, Chao Huang, Junnan Li:
Aria-UI: Visual Grounding for GUI Instructions. CoRR abs/2412.16256 (2024)- 2023
[j5]Kaihao Zhang
, Dongxu Li
, Wenhan Luo
, Wenqi Ren
, Wei Liu
:
Enhanced Spatio-Temporal Interaction Learning for Video Deraining: Faster and Better. IEEE Trans. Pattern Anal. Mach. Intell. 45(1): 1287-1293 (2023)
[j4]Kaihao Zhang
, Dongxu Li
, Wenhan Luo
, Jingyu Liu
, Jiankang Deng
, Wei Liu
, Stefanos Zafeiriou
:
EDFace-Celeb-1 M: Benchmarking Face Hallucination With a Million-Scale Dataset. IEEE Trans. Pattern Anal. Mach. Intell. 45(3): 3968-3978 (2023)
[j3]Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong:
Linearized Relative Positional Encoding. Trans. Mach. Learn. Res. 2023 (2023)
[c20]Dongxu Li, Junnan Li, Hung Le, Guangsen Wang, Silvio Savarese, Steven C. H. Hoi:
LAVIS: A One-stop Library for Language-Vision Intelligence. ACL (demo) 2023: 31-41
[c19]Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, Dacheng Tao
, Steven C. H. Hoi:
From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models. CVPR 2023: 10867-10877
[c18]Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, Yiran Zhong:
Toeplitz Neural Network for Sequence Modeling. ICLR 2023
[c17]Junnan Li, Dongxu Li, Silvio Savarese, Steven C. H. Hoi:
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. ICML 2023: 19730-19742
[c16]Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven C. H. Hoi:
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. NeurIPS 2023
[c15]Dongxu Li, Junnan Li, Steven C. H. Hoi:
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing. NeurIPS 2023
[i20]Junnan Li, Dongxu Li, Silvio Savarese, Steven C. H. Hoi:
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. CoRR abs/2301.12597 (2023)
[i19]Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, Yiran Zhong:
Toeplitz Neural Network for Sequence Modeling. CoRR abs/2305.04749 (2023)
[i18]Wenliang Dai, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Junqi Zhao, Weisheng Wang, Boyang Li, Pascale Fung, Steven C. H. Hoi:
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. CoRR abs/2305.06500 (2023)
[i17]Dongxu Li, Junnan Li, Steven C. H. Hoi:
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing. CoRR abs/2305.14720 (2023)
[i16]Zhen Qin, Weixuan Sun, Kaiyue Lu, Hui Deng, Dongxu Li, Xiaodong Han, Yuchao Dai, Lingpeng Kong, Yiran Zhong:
Linearized Relative Positional Encoding. CoRR abs/2307.09270 (2023)
[i15]Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, Ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles:
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning. CoRR abs/2311.18799 (2023)- 2022
[j2]Wenjia Niu
, Kaihao Zhang, Dongxu Li, Wenhan Luo
:
Four-player GroupGAN for weak expression recognition via latent expression magnification. Knowl. Based Syst. 251: 109304 (2022)
[c14]Dongxu Li, Chenchen Xu, Liu Liu, Yiran Zhong, Rong Wang, Lars Petersson, Hongdong Li:
Transcribing Natural Languages for the Deaf via Neural Editing Programs. AAAI 2022: 11991-11999
[c13]Chenchen Xu, Dongxu Li, Hongdong Li, Hanna Suominen
, Ben Swift
:
Automatic Gloss Dictionary for Sign Language Learners. ACL (demo) 2022: 83-92
[c12]Dongxu Li, Junnan Li, Hongdong Li, Juan Carlos Niebles, Steven C. H. Hoi:
Align and Prompt: Video-and-Language Pre-training with Entity Prompts. CVPR 2022: 4943-4953
[c11]Zhen Qin, Xiaodong Han, Weixuan Sun, Dongxu Li, Lingpeng Kong, Nick Barnes, Yiran Zhong:
The Devil in Linear Transformer. EMNLP 2022: 7025-7041
[c10]Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong:
cosFormer: Rethinking Softmax In Attention. ICLR 2022
[c9]Junnan Li, Dongxu Li, Caiming Xiong, Steven C. H. Hoi:
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. ICML 2022: 12888-12900
[i14]Junnan Li, Dongxu Li, Caiming Xiong, Steven C. H. Hoi:
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation. CoRR abs/2201.12086 (2022)
[i13]Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong:
cosFormer: Rethinking Softmax in Attention. CoRR abs/2202.08791 (2022)
[i12]Dongxu Li, Junnan Li, Hung Le, Guangsen Wang, Silvio Savarese, Steven C. H. Hoi:
LAVIS: A Library for Language-Vision Intelligence. CoRR abs/2209.09019 (2022)
[i11]Zhen Qin, Xiaodong Han, Weixuan Sun, Dongxu Li, Lingpeng Kong, Nick Barnes, Yiran Zhong:
The Devil in Linear Transformer. CoRR abs/2210.10340 (2022)
[i10]Jiaxian Guo, Junnan Li, Dongxu Li, Anthony Meng Huat Tiong, Boyang Li, Dacheng Tao
, Steven C. H. Hoi:
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models. CoRR abs/2212.10846 (2022)- 2021
[j1]Kaihao Zhang
, Dongxu Li, Wenhan Luo
, Wenqi Ren
:
Dual Attention-in-Attention Model for Joint Rain Streak and Raindrop Removal. IEEE Trans. Image Process. 30: 7608-7619 (2021)
[c8]Dongxu Li, Chenchen Xu, Kaihao Zhang, Xin Yu
, Yiran Zhong, Wenqi Ren, Hanna Suominen
, Hongdong Li
:
ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring. CVPR 2021: 7721-7731
[c7]Kaihao Zhang, Dongxu Li, Wenhan Luo
, Wenqi Ren, Björn Stenger, Wei Liu, Hongdong Li, Ming-Hsuan Yang:
Benchmarking Ultra-High-Definition Image Super-resolution. ICCV 2021: 14749-14758
[i9]Dongxu Li, Chenchen Xu, Kaihao Zhang, Xin Yu, Yiran Zhong, Wenqi Ren, Hanna Suominen, Hongdong Li:
ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring. CoRR abs/2103.04260 (2021)
[i8]Kaihao Zhang, Dongxu Li, Wenhan Luo, Wenqi Ren, Lin Ma, Hongdong Li:
Dual Attention-in-Attention Model for Joint Rain Streak and Raindrop Removal. CoRR abs/2103.07051 (2021)
[i7]Kaihao Zhang, Dongxu Li, Wenhan Luo, Wen-Yan Lin, Fang Zhao, Wenqi Ren, Wei Liu, Hongdong Li:
Enhanced Spatio-Temporal Interaction Learning for Video Deraining: A Faster and Better Framework. CoRR abs/2103.12318 (2021)
[i6]Kaihao Zhang, Dongxu Li, Wenhan Luo, Jingyu Liu, Jiankang Deng, Wei Liu, Stefanos Zafeiriou:
EDFace-Celeb-1M: Benchmarking Face Hallucination with a Million-scale Dataset. CoRR abs/2110.05031 (2021)
[i5]Dongxu Li, Junnan Li, Hongdong Li, Juan Carlos Niebles, Steven C. H. Hoi:
Align and Prompt: Video-and-Language Pre-training with Entity Prompts. CoRR abs/2112.09583 (2021)
[i4]Dongxu Li, Chenchen Xu, Liu Liu, Yiran Zhong, Rong Wang, Lars Petersson, Hongdong Li:
Transcribing Natural Languages for The Deaf via Neural Editing Programs. CoRR abs/2112.09600 (2021)- 2020
[c6]Dongxu Li, Xin Yu
, Chenchen Xu, Lars Petersson
, Hongdong Li
:
Transferring Cross-Domain Knowledge for Video Sign Language Recognition. CVPR 2020: 6204-6213
[c5]Dongxu Li, Stanley Bak, Sergiy Bogomolov:
Reachability Analysis of Nonlinear Systems Using Hybridization and Dynamics Scaling. FORMATS 2020: 265-282
[c4]Dongxu Li, Chenchen Xu, Xin Yu, Kaihao Zhang, Benjamin Swift, Hanna Suominen, Hongdong Li:
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation. NeurIPS 2020
[c3]Dongxu Li, Cristian Rodriguez Opazo, Xin Yu
, Hongdong Li
:
Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison. WACV 2020: 1448-1458
[i3]Dongxu Li, Xin Yu, Chenchen Xu, Lars Petersson, Hongdong Li:
Transferring Cross-domain Knowledge for Video Sign Language Recognition. CoRR abs/2003.03703 (2020)
[i2]Dongxu Li, Chenchen Xu, Xin Yu, Kaihao Zhang, Ben Swift, Hanna Suominen
, Hongdong Li:
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation. CoRR abs/2010.05468 (2020)
2010 – 2019
- 2019
[c2]Sergiy Bogomolov
, Goran Frehse
, Amit Gurung, Dongxu Li, Georg Martius, Rajarshi Ray:
Falsification of hybrid systems using symbolic reachability and trajectory splicing. HSCC 2019: 1-10
[i1]Dongxu Li, Cristian Rodriguez Opazo, Xin Yu
, Hongdong Li:
Word-level Deep Sign Language Recognition from Video: A New Large-scale Dataset and Methods Comparison. CoRR abs/1910.11006 (2019)- 2018
[c1]Dongxu Li, Enrico Scala
, Patrik Haslum, Sergiy Bogomolov:
Effect-Abstraction Based Relaxation for Linear Numeric Planning. IJCAI 2018: 4787-4793
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-05-06 00:11 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







