One Day Meeting: Trustworthy Multimodal Learning with Foundation Models: Bridging the Gap between AI Research and Real World Applications
Wednesday 24 April 2024
Chairs: Chao Zhang (Toshiba Europe Ltd), Jindong Gu (University of Oxford), Shitong Sun (Queen Mary University of London), Onay Urfalioglu (Vivo Tech GmbH)
<
We invite academic and industry presentations, bringing together researchers interested in all aspects of foundational models (GPT-4, CLIP, SAM, etc) and multimodal learning involving, but not limited to, image, video, audio, depth, text, drawings, laser, IMU, etc.
Invited Speakers
- Guohao Li (University of Oxford & CAMEL-AI.org )
- Oleg Sinavski (Wayve, London)
- Da Li (Samsung Research)
- Rudra Poudel (Toshiba Europe)
- Ashkan Khakzar (University of Oxford)
Videos of Talks
On our BMVA YouTube channel there are recorded talks of the slides and speaker from the day here
Programme
Start | End | Title | ||
---|---|---|---|---|
09:00 | 09:15 | Registration/Poster Set-up | ||
09:15 | 09:20 | Opening Remarks | ||
09:20 | 10:00 | Invited Speaker - Guohao Li, | ||
10:00 | 10:40 | Invited Speaker - Oleg Sinavski | ||
10:40 | 11:05 | Coffee Break + Posters | ||
11:05 | 12:20 | Accepted Talks - Pt. 1 | ||
12:20 | 13:20 | Lunch + Posters | ||
13:20 | 14:00 | Invited Speaker - Da Li | ||
14:00 | 15:15 | Accepted Talks - Pt. 2 | ||
15:15 | 15:40 | Coffee Break + Posters | ||
15:40 | 16:20 | Invited Speaker - Rudra Poudel | ||
16:20 | 17:00 | Invited Speaker - Ashkan Khakzar | ||
17:00 | 17:05 | Past, Present, and Future of Vision-Language |
Talk Part 1 (15 mins each)
- Hang Dai (University of Glasgow) Multimodal BEV Fusion for Autonomous Driving
- Zhening Huang (University of Cambridge) OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
- Jian Hu (Queen Mary University of London) Is Instance-specific Manual Prompt Necessary for Promptable Semantic Segmentation?
- Xingchen Zhang (Imperial College London) Self-supervised RGBT tracking with Cross-input consistency
- Yongshuo Zong (University of Edinburgh) Safety Fine-Tuning at (Almost) No Cost: A Baseline for Vision Large Language Models
Talk Part 2 (15 mins each)
- Chengzu Li (University of Cambridge) On Task Performance and Model Calibration with Supervised and Self-Ensembled In-Context Learning
- Ziquan Liu (Queen Mary University of London) Borrowing Treasures from Neighbors: In-Context Learning for Multimodal Learning with Missing Modalities and Data Scarcity
- Yimeng Gu (Queen Mary University of London) Domain Adaptive Multimodal Out-of-context News Detection
- Yinghao Ma (Queen Mary University of London) MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response
- Chen Chen (University of Sheffield) Unlocking the Value of Single Modality through Multi-Modal Knowledge Transfer with Large Language Models
Posters
- Cangxiong Chen (University of Bath) Understanding the Vulnerability of CLIP to Image Compression
- Charlie Grimshaw (University of Sheffield) Using Large Vision Language Models to detect Propaganda Techniques in memes
- Dean Slack (Durham University) Enhancing Next-Frame Video Prediction through Linguistic Scene Understanding
- Anum Masood (Harvard Medical School) Advancing Accuracy in Multimodal Medical Tasks through Bootstrapped Language-Image Pretraining (BLIP)
Meeting Location
The meeting will take place at:
British Computer Society (BCS), 25 Copthall Avenue, London EC2R 7BP
Registration
We keep the cost of attending these events as low as possible to ensure no barriers from the whole computer vision community attending. The registration costs are as follows
-
BMVA Members: £20
-
Non BMVA Members £40 (Includes membership to the BMVA for 2024)
Both include lunch and refreshments for the day