Koustava Goswami

Research Scientist 2 at Adobe Research

🏆 His work on Acrobat AI Assistant has been covered by multiple newspapers and channels when named one of TIME's Best Inventions of 2024

Currently developing GenAI features for Adobe Acrobat Studio. He is the core researcher behind Acrobat AI Assistant, working on cutting-edge Natural Language Processing and Artificial Intelligence technologies.

Koustava Goswami

About Me

He is a Research Scientist 2 at Adobe Research with expertise in Natural Language Processing and Artificial Intelligence. Currently, he is developing GenAI features for Adobe Acrobat Studio and serves as the core researcher behind Acrobat AI Assistant.

Research Highlights

Advanced LLM Techniques

+

GRPO, post-training, and reinforcement learning algorithms for enhancing LLM capabilities while maintaining cost and memory efficiency.

Multimodal AI

+

Text-image alignment research focusing on charts, infographics, and structured visual content for enhanced document understanding.

Practical Applications

+

Cost and memory efficient solutions for real-world deployment, including reward-based post-training and optimized model architectures.

Real-world Impact

+

Document understanding, RAG systems, and agentic AI applications that transform how people interact with complex information.

Detailed Research Interests

+

1. Large Scale LLM Post Training

  • Structured information such as stylized texts, tables, text from different languages presents significant challenges for text-rich long context document understanding and reasoning. His recent works have focused on enhancing the "reading ability" of LLMs. The training involves different fine-tuning and reinforcement learning algorithms. Currently his focus on making smaller models more robust with enhanced reward based post-training to save memory and cost.
  • He is working making multilingual LLMs better on understanding Large Scale Documents with the ability of cross-lingual and bilingual Question Answering and Summary Generation. During his research and experiments he collaborated with teams to collect rich documents with annotations, to post train models.
  • Long Context Documents are just not about text but diverse images, thus he has written research papers to tackle the problem of text-image alignment. Currently he is interested in exploring how to align textual semantics better with structure rich images like charts and infographics.

2. Model Explainability

  • In this domain his research largely around textual and image citation/attribution to source text or image from generative spans. His research work around this span out both in the domain of training free model explainability and small case model post training.
  • Another thread in this domain is to curate dataset which can be used to evaluate and train models capable of doing attribution to source text and images.

3. Agentic AI based on Small Post trained LLMs

  • He is recently interested in two problems: (1) How to enable agents (text and multimodal) to efficiently interact with real environments and learn from it (planning and reasoning workflows, tool-usage). (2) How to enable LLMs to better understand action space to do better reasoning which can help cost and memory efficient Retrieval-Augmented Generation (RAG)?

30+

Publications

10+

Research Projects

12

patent Applications

Publications

Some papers are shown below; details can be found in DBLP/Google Scholar

2025

+

Poetry in Pixels: Prompt Tuning for Poem Image Generation via Diffusion Models

Proceedings of the 29th International Conference on Computational Linguistics (COLING 2025)

Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs

Proceedings of the 2025 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2025)

2024

+

CoPL: Contextual Prompt Learning for Vision-Language Understanding

Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI 2024)

Iterative Multi-Granular Image Editing Using Diffusion Models

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2024)

2023

+

A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2023)

Weakly-supervised Deep Cognate Detection Framework for Low-Resourced Languages Using Morphological Knowledge of Closely-Related Languages

arXiv preprint (2023)

Drilling Down into the Discourse Structure with LLMs for Long Document Question Answering

arXiv preprint (2023)

2022

+

Cross-lingual Document Understanding and Question Answering

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP 2022)

Autonomous Car Driving with Deep Reinforcement Learning

IEEE Transactions on Intelligent Transportation Systems (2022)

2021

+

Sentiment Analysis in Multilingual Social Media

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL 2021)

Neural Machine Translation for Low-Resource Languages

Computational Linguistics Journal (2021)

Curriculum Vitae

Download CV

Get the complete version of his curriculum vitae

Download PDF

Education

Ph.D. in Natural Language Processing

Insight Centre for Data Analytics, University of Galway, Ireland

Research Experience

Adobe Research - (12/2022- Present)

Collaboration

Open to Collaborations

I am always happy to collaborate with enthusiastic and talented students on topics related to multimodal large language models and reinforcement learning. If you are interested in working with me, feel free to reach out.

Multimodal LLMs
Reinforcement Learning
Document Understanding
AI Research
Get in Touch

Get in Touch

Email

koustavagoswami@adobe.com

Location

Adobe Research
San Jose, CA, USA

Phone

Available upon request

Connect with him