Last active
April 21, 2025 15:56
-
-
Save masta-g3/8f7227397b1053b42e727bbd6abf1d2e to your computer and use it in GitHub Desktop.
Updated 2025-04-21
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Cedille: A large autoregressive French language model | |
The Wisdom of Hindsight Makes Language Models Better Instruction Followers | |
ChatGPT: A Study on its Utility for Ubiquitous Software Engineering Tasks | |
Query2doc: Query Expansion with Large Language Models | |
The Internal State of an LLM Knows When its Lying | |
Structured information extraction from complex scientific text with fine-tuned large language models | |
TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models | |
Large Language Models Encode Clinical Knowledge | |
PoET: A generative model of protein families as sequences-of-sequences | |
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training | |
Prompt Sapper: LLM-Empowered Software Engineering Infrastructure for AI-Native Services | |
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs | |
Modeling Protein Using Large-scale Pretrain Language Model | |
A Watermark for Large Language Models | |
GPT is becoming a Turing machine: Here are some ways to program it | |
Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model | |
Large Language Models are Zero-Shot Reasoners | |
From Images to Textual Prompts: Zero-shot VQA with Frozen Large Language Models | |
How is ChatGPT's behavior changing over time? | |
Meta-Transformer: A Unified Framework for Multimodal Learning | |
Scaling Relationship on Learning Mathematical Reasoning with Large Language Models | |
Getting More out of Large Language Models for Proofs | |
Teaching Small Language Models to Reason | |
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes | |
Learning to Retrieve In-Context Examples for Large Language Models | |
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | |
Context-Aware Abbreviation Expansion Using Large Language Models | |
Focused Transformer: Contrastive Training for Context Scaling | |
Flash normalization: fast RMSNorm for LLMs | |
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | |
Long-range Language Modeling with Self-retrieval | |
Alexa, play with robot: Introducing the First Alexa Prize SimBot Challenge on Embodied AI | |
Towards Generalist Biomedical AI | |
Shortcut Learning of Large Language Models in Natural Language Understanding | |
Quantifying Memorization Across Neural Language Models | |
LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models | |
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models | |
Copy Is All You Need | |
Automatic Chain of Thought Prompting in Large Language Models | |
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models | |
Decomposed Prompting: A Modular Approach for Solving Complex Tasks | |
Evaluating the Text-to-SQL Capabilities of Large Language Models | |
On the Origin of LLMs: An Evolutionary Tree and Graph for 15,821 Large Language Models | |
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | |
Are Emergent Abilities of Large Language Models a Mirage? | |
Enhancing Network Management Using Code Generated by Large Language Models | |
Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks | |
ThinkSum: Probabilistic reasoning over sets using large language models | |
On the Tool Manipulation Capability of Open-source Large Language Models | |
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm | |
WavJourney: Compositional Audio Creation with Large Language Models | |
ChatGPT, Can You Generate Solutions for my Coding Exercises? An Evaluation on its Effectiveness in an undergraduate Java Programming Course | |
Secrets of RLHF in Large Language Models Part I: PPO | |
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models | |
One-for-All: Generalized LoRA for Parameter-Efficient Fine-tuning | |
Prototypical Fine-tuning: Towards Robust Performance Under Varying Data Sizes | |
Challenges and Applications of Large Language Models | |
SPOT: Knowledge-Enhanced Language Representations for Information Extraction | |
Kosmos-2: Grounding Multimodal Large Language Models to the World | |
Deep Language Networks: Joint Prompt Training of Stacked LLMs using Variational Inference | |
SKILL: Structured Knowledge Infusion for Large Language Models | |
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models | |
Understanding Social Reasoning in Language Models with Language Models | |
The Science of Detecting LLM-Generated Texts | |
CausalLM is not optimal for in-context learning | |
Questioning the Survey Responses of Large Language Models | |
Extending Context Window of Large Language Models via Positional Interpolation | |
ChatGPT and a New Academic Reality: Artificial Intelligence-Written Research Papers and the Ethics of the Large Language Models in Scholarly Publishing | |
Probing Factually Grounded Content Transfer with Factual Ablation | |
Teach LLMs to Personalize -- An Approach inspired by Writing Education | |
Pre-Trained Large Language Models for Industrial Control | |
WebGLM: Towards An Efficient Web-Enhanced Question Answering System with Human Preferences | |
LongNet: Scaling Transformers to 1,000,000,000 Tokens | |
Self-Alignment with Instruction Backtranslation | |
Guiding Pretraining in Reinforcement Learning with Large Language Models | |
Large Language Models are Zero-Shot Rankers for Recommender Systems | |
Model evaluation for extreme risks | |
Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks | |
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL | |
A Simple and Effective Pruning Approach for Large Language Models | |
Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors | |
Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback | |
Stack More Layers Differently: High-Rank Training Through Low-Rank Updates | |
TableGPT: Towards Unifying Tables, Nature Language and Commands into One GPT | |
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models | |
LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models | |
PromptChainer: Chaining Large Language Model Prompts through Visual Programming | |
PIPPA: A Partially Synthetic Conversational Dataset | |
Let's Verify Step by Step | |
Evaluating Large Language Models on a Highly-specialized Topic, Radiation Oncology Physics | |
SambaNova SN40L: Scaling the AI Memory Wall with Dataflow and Composition of Experts | |
Large Language Models Are Reasoning Teachers | |
GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models | |
Large Language Models as Tax Attorneys: A Case Study in Legal Capabilities Emergence | |
Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations | |
Connecting Neural Response measurements & Computational Models of language: a non-comprehensive guide | |
Accelerating LLM Inference with Staged Speculative Decoding | |
Large Language Models for Supply Chain Optimization | |
Do Large Language Models know what humans know? | |
Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction | |
Faithful Chain-of-Thought Reasoning | |
AI Chains: Transparent and Controllable Human-AI Interaction by Chaining Large Language Model Prompts | |
Superposition of many models into one | |
Learning to Model the World with Language | |
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models | |
Unifying Large Language Models and Knowledge Graphs: A Roadmap | |
RAVEN: In-Context Learning with Retrieval Augmented Encoder-Decoder Language Models | |
QLoRA: Efficient Finetuning of Quantized LLMs | |
Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models' Alignment | |
Co-Writing with Opinionated Language Models Affects Users' Views | |
Language models show human-like content effects on reasoning | |
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking | |
Code Generation Tools (Almost) for Free? A Study of Few-Shot, Pre-Trained Language Models on Code | |
OpenAGI: When LLM Meets Domain Experts | |
Using Large Language Models to Simulate Multiple Humans and Replicate Human Subject Studies | |
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models | |
Beyond Generating Code: Evaluating GPT on a Data Visualization Course | |
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling | |
UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition | |
LLM-Rec: Personalized Recommendation via Prompting Large Language Models | |
Studying Large Language Model Generalization with Influence Functions | |
Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change) | |
From Sparse to Soft Mixtures of Experts | |
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization | |
INT2.1: Towards Fine-Tunable Quantized Large Language Models with Error Correction through Low-Rank Adaptation | |
Code Prompting: a Neural Symbolic Method for Complex Reasoning in Large Language Models | |
Large Language Model Guided Tree-of-Thought | |
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback | |
LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition | |
When Geometric Deep Learning Meets Pretrained Protein Language Models | |
Beyond Black Box AI-Generated Plagiarism Detection: From Sentence to Document Level | |
Language models are weak learners | |
How Many Demonstrations Do You Need for In-context Learning? | |
Direct Preference Optimization: Your Language Model is Secretly a Reward Model | |
TinyStories: How Small Can Language Models Be and Still Speak Coherent English? | |
Gorilla: Large Language Model Connected with Massive APIs | |
Automatic Generation of Programming Exercises and Code Explanations using Large Language Models | |
Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models | |
Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models | |
WebArena: A Realistic Web Environment for Building Autonomous Agents | |
Language Models can Solve Computer Tasks | |
ChatGPT Is on the Horizon: Could a Large Language Model Be All We Need for Intelligent Transportation? | |
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling | |
Invariant Language Modeling | |
Solving Quantitative Reasoning Problems with Language Models | |
Personality Traits in Large Language Models | |
Prompting Large Language Models with Speech Recognition Abilities | |
Selective Annotation Makes Language Models Better Few-Shot Learners | |
Using Captum to Explain Generative Language Models | |
Fine-Tuning Language Models with Just Forward Passes | |
In-context Autoencoder for Context Compression in a Large Language Model | |
Entity Projection via Machine Translation for Cross-Lingual NER | |
OctoPack: Instruction Tuning Code Large Language Models | |
AlpaGasus: Training A Better Alpaca with Fewer Data | |
Large Language Models Are Human-Level Prompt Engineers | |
DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales | |
CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction | |
WizardCoder: Empowering Code Large Language Models with Evol-Instruct | |
Selection-Inference: Exploiting Large Language Models for Interpretable Logical Reasoning | |
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning | |
Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach | |
Large Language Models Can Self-Improve | |
Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks | |
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents | |
More Agents Is All You Need | |
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models | |
Teaching Algorithmic Reasoning via In-context Learning | |
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning | |
BuboGPT: Enabling Visual Grounding in Multi-Modal LLMs | |
The Larger They Are, the Harder They Fail: Language Models do not Recognize Identifier Swaps in Python | |
KI-BERT: Infusing Knowledge Context for Better Language and Domain Understanding | |
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models | |
Tree of Thoughts: Deliberate Problem Solving with Large Language Models | |
Automatic Evaluation of Attribution by Large Language Models | |
Generative Agents: Interactive Simulacra of Human Behavior | |
ALERT: Adapting Language Models to Reasoning Tasks | |
How does the pre-training objective affect what large language models learn about linguistic properties? | |
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback | |
LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models | |
From Word Models to World Models: Translating from Natural Language to the Probabilistic Language of Thought | |
Open-Source Large Language Models Outperform Crowd Workers and Approach ChatGPT in Text-Annotation Tasks | |
Causal Reasoning and Large Language Models: Opening a New Frontier for Causality | |
FLIRT: Feedback Loop In-context Red Teaming | |
News Summarization and Evaluation in the Era of GPT-3 | |
Galactica: A Large Language Model for Science | |
Towards Reasoning in Large Language Models: A Survey | |
Chain-Of-Thought Prompting Under Streaming Batch: A Case Study | |
Shepherd: A Critic for Language Model Generation | |
Emergent autonomous scientific research capabilities of large language models | |
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language | |
Social Simulacra: Creating Populated Prototypes for Social Computing Systems | |
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face | |
LLMs as Workers in Human-Computational Algorithms? Replicating Crowdsourcing Pipelines with LLMs | |
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis | |
Universal and Transferable Adversarial Attacks on Aligned Language Models | |
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages | |
Complexity-Based Prompting for Multi-Step Reasoning | |
The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only | |
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance | |
Scaling TransNormer to 175 Billion Parameters | |
CodeTF: One-stop Transformer Library for State-of-the-art Code LLM | |
A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation | |
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model | |
Learning ASR pathways: A sparse multilingual ASR model | |
Stay on topic with Classifier-Free Guidance | |
Constitutional AI: Harmlessness from AI Feedback | |
Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis | |
Teaching Arithmetic to Small Transformers | |
Demystifying GPT Self-Repair for Code Generation | |
Performance of ChatGPT on USMLE: Unlocking the Potential of Large Language Models for AI-Assisted Medical Education | |
Link-Context Learning for Multimodal LLMs | |
Large Language Models Perform Diagnostic Reasoning | |
InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback | |
AgentBench: Evaluating LLMs as Agents | |
Xmodel-LM Technical Report | |
Simple synthetic data reduces sycophancy in large language models | |
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation | |
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs | |
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models | |
Re-visiting Automated Topic Model Evaluation with Large Language Models | |
Large Language Models are Effective Text Rankers with Pairwise Ranking Prompting | |
Adaptive Test Generation Using a Large Language Model | |
Large Language Models Are Implicitly Topic Models: Explaining and Finding Good Demonstrations for In-Context Learning | |
SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models | |
PaLM: Scaling Language Modeling with Pathways | |
Teaching Large Language Models to Self-Debug | |
Building Cooperative Embodied Agents Modularly with Large Language Models | |
Urdu text in natural scene images: a new dataset and preliminary text detection | |
LIMA: Less Is More for Alignment | |
Leveraging Large Language Models for Topic Classification in the Domain of Public Affairs | |
GPT-NER: Named Entity Recognition via Large Language Models | |
Say What You Mean! Large Language Models Speak Too Positively about Negative Commonsense Knowledge | |
Code as Policies: Language Model Programs for Embodied Control | |
Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification | |
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models | |
Summary of ChatGPT/GPT-4 Research and Perspective Towards the Future of Large Language Models | |
Inspecting and Editing Knowledge Representations in Language Models | |
TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents | |
Large language models effectively leverage document-level context for literary translation, but critical errors persist | |
Med-Flamingo: a Multimodal Medical Few-shot Learner | |
CompactifAI: Extreme Compression of Large Language Models using Quantum-Inspired Tensor Networks | |
Jigsaw: Large Language Models meet Program Synthesis | |
Large Language Models Struggle to Learn Long-Tail Knowledge | |
Llama 2: Open Foundation and Fine-Tuned Chat Models | |
Textbooks Are All You Need | |
Crowd Score: A Method for the Evaluation of Jokes using Large Language Model AI Voters as Judges | |
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis | |
Orca: Progressive Learning from Complex Explanation Traces of GPT-4 | |
Skills-in-Context Prompting: Unlocking Compositionality in Large Language Models | |
Three Bricks to Consolidate Watermarks for Large Language Models | |
The Devil is in the Errors: Leveraging Large Language Models for Fine-grained Machine Translation Evaluation | |
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets | |
One-shot Machine Teaching: Cost Very Few Examples to Converge Faster | |
Theory of Mind May Have Spontaneously Emerged in Large Language Models | |
Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models | |
Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting | |
Tiny LVLM-eHub: Early Multimodal Experiments with Bard | |
Language Is Not All You Need: Aligning Perception with Language Models | |
Mind's Eye: Grounded Language Model Reasoning through Simulation | |
StarCoder: may the source be with you! | |
Self-Critique Prompting with Large Language Models for Inductive Instructions | |
PaLM 2 Technical Report | |
Repository-Level Prompt Generation for Large Language Models of Code | |
L-Eval: Instituting Standardized Evaluation for Long Context Language Models | |
Measuring and Narrowing the Compositionality Gap in Language Models | |
Differentially Private Fine-tuning of Language Models | |
A Latent Space Theory for Emergent Abilities in Large Language Models | |
Reflexion: Language Agents with Verbal Reinforcement Learning | |
Ambient Adventures: Teaching ChatGPT on Developing Complex Stories | |
LEACE: Perfect linear concept erasure in closed form | |
Machine Psychology: Investigating Emergent Capabilities and Behavior in Large Language Models Using Psychological Methods | |
A PhD Student's Perspective on Research in NLP in the Era of Very Large Language Models | |
Voyager: An Open-Ended Embodied Agent with Large Language Models | |
FinGPT: Open-Source Financial Large Language Models | |
Block Belief Propagation for Parameter Learning in Markov Random Fields | |
Lost in the Middle: How Language Models Use Long Contexts | |
Artificial Artificial Artificial Intelligence: Crowd Workers Widely Use Large Language Models for Text Production Tasks | |
Ada-Ranker: A Data Distribution Adaptive Ranking Paradigm for Sequential Recommendation | |
Baize: An Open-Source Chat Model with Parameter-Efficient Tuning on Self-Chat Data | |
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents | |
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding | |
The Hydra Effect: Emergent Self-repair in Language Model Computations | |
Educational data augmentation in physics education research using ChatGPT | |
PolyLM: An Open Source Polyglot Large Language Model | |
Towards Expert-Level Medical Question Answering with Large Language Models | |
Is GPT-4 a Good Data Analyst? | |
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision | |
Parsel: Algorithmic Reasoning with Language Models by Composing Decompositions | |
ChatGPT is fun, but it is not funny! Humor is still challenging Large Language Models | |
Seeing ChatGPT Through Students' Eyes: An Analysis of TikTok Data | |
LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond | |
ReAct: Synergizing Reasoning and Acting in Language Models | |
Augmenting Language Models with Long-Term Memory | |
BloombergGPT: A Large Language Model for Finance | |
A Systematic Evaluation of Large Language Models of Code | |
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models | |
Robot Task Planning and Situation Handling in Open Worlds | |
Large Language Models are Competitive Near Cold-start Recommenders for Language- and Item-based Preferences | |
Emergent Abilities of Large Language Models | |
Can Large Language Models design a Robot? | |
KoLA: Carefully Benchmarking World Knowledge of Large Language Models | |
Clinical Camel: An Open-Source Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding | |
DarkBERT: A Language Model for the Dark Side of the Internet | |
Measuring Faithfulness in Chain-of-Thought Reasoning | |
Retentive Network: A Successor to Transformer for Large Language Models | |
Dissociating language and thought in large language models: a cognitive perspective | |
Large Language Models are Better Reasoners with Self-Verification | |
Can large language models reason about medical questions? | |
Towards Revealing the Mystery behind Chain of Thought: A Theoretical Perspective | |
ARB: Advanced Reasoning Benchmark for Large Language Models | |
Rethinking with Retrieval: Faithful Large Language Model Inference | |
A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models | |
Becoming self-instruct: introducing early stopping criteria for minimal instruct tuning | |
Explainable Verbal Reasoner Plus (EVR+): A Natural Language Reasoning Framework that Supports Diverse Compositional Reasoning | |
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners | |
Large Language Models as Corporate Lobbyists | |
MetaGPT: Meta Programming for Multi-Agent Collaborative Framework | |
Data-Driven Approach for Formality-Sensitive Machine Translation: Language-Specific Handling and Synthetic Data Generation | |
OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models | |
Talking About Large Language Models | |
Platypus: Quick, Cheap, and Powerful Refinement of LLMs | |
Large Language Models Can Be Easily Distracted by Irrelevant Context | |
Unleashing Cognitive Synergy in Large Language Models: A Task-Solving Agent through Multi-Persona Self-Collaboration | |
OpenICL: An Open-Source Framework for In-context Learning | |
Emergence of Maps in the Memories of Blind Navigation Agents | |
PMC-LLaMA: Further Finetuning LLaMA on Medical Papers | |
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining | |
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention | |
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation | |
Learning to Reason and Memorize with Self-Notes | |
ChemCrow: Augmenting large-language models with chemistry tools | |
Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor | |
Learning to Compress Prompts with Gist Tokens | |
Unlimiformer: Long-Range Transformers with Unlimited Length Input | |
StructGPT: A General Framework for Large Language Model to Reason over Structured Data | |
ChatGPT: Applications, Opportunities, and Threats | |
Memory Augmented Large Language Models are Computationally Universal | |
PaLM-E: An Embodied Multimodal Language Model | |
M2T: Masking Transformers Twice for Faster Decoding | |
Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond | |
A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models | |
DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature | |
Auditing large language models: a three-layered approach | |
Language models in molecular discovery | |
Offsite-Tuning: Transfer Learning without Full Model | |
MusicLM: Generating Music From Text | |
Context-faithful Prompting for Large Language Models | |
SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot | |
Hungry Hungry Hippos: Towards Language Modeling with State Space Models | |
Halo: Estimation and Reduction of Hallucinations in Open-Source Weak Large Language Models | |
The Costly Dilemma: Generalization, Evaluation and Cost-Optimal Deployment of Large Language Models | |
GPTutor: a ChatGPT-powered programming tool for code explanation | |
Larger language models do in-context learning differently | |
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans | |
Minding Language Models' (Lack of) Theory of Mind: A Plug-and-Play Multi-Character Belief Tracker | |
ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge | |
Multimodal Chain-of-Thought Reasoning in Language Models | |
Recitation-Augmented Language Models | |
Hyena Hierarchy: Towards Larger Convolutional Language Models | |
Eight Things to Know about Large Language Models | |
PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing | |
A Survey on Model Compression for Large Language Models | |
Active Retrieval Augmented Generation | |
Toolformer: Language Models Can Teach Themselves to Use Tools | |
Evaluating Verifiability in Generative Search Engines | |
Augmented Language Models: a Survey | |
Evaluating ChatGPT's Information Extraction Capabilities: An Assessment of Performance, Explainability, Calibration, and Faithfulness | |
Giraffe: Adventures in Expanding Context Lengths in LLMs | |
LLM As DBA | |
Scaling Transformer to 1M tokens and beyond with RMT | |
TidyBot: Personalized Robot Assistance with Large Language Models | |
Exploring the Intersection of Large Language Models and Agent-Based Modeling via Prompt Engineering | |
Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability | |
Active Prompting with Chain-of-Thought for Large Language Models | |
A Categorical Archive of ChatGPT Failures | |
Artificial muses: Generative Artificial Intelligence Chatbots Have Risen to Human-Level Creativity | |
Better Language Models of Code through Self-Improvement | |
DERA: Enhancing Large Language Model Completions with Dialog-Enabled Resolving Agents | |
The Capacity for Moral Self-Correction in Large Language Models | |
Poisoning Language Models During Instruction Tuning | |
Prompt2Model: Generating Deployable Models from Natural Language Instructions | |
Data Selection for Language Models via Importance Resampling | |
Enabling Conversational Interaction with Mobile UI using Large Language Models | |
Evidence of Meaning in Language Models Trained on Programs | |
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection | |
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models | |
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark | |
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models | |
Symbol tuning improves in-context learning in language models | |
REPLUG: Retrieval-Augmented Black-Box Language Models | |
Why do Nearest Neighbor Language Models Work? | |
Prismer: A Vision-Language Model with An Ensemble of Experts | |
AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models | |
Self-evolving Agents with reflective and memory-augmented abilities | |
CALYPSO: LLMs as Dungeon Masters' Assistants | |
Mind your Language (Model): Fact-Checking LLMs and their Role in NLP Research and Practice | |
Code Llama: Open Foundation Models for Code | |
Ground Manipulator Primitive Tasks to Executable Actions using Large Language Models | |
Faithful to Whom? Questioning Interpretability Measures in NLP | |
Evaluating Large Language Models on Graphs: Performance Insights and Comparative Analysis | |
Large Language Models on Wikipedia-Style Survey Generation: an Evaluation in NLP Concepts | |
How Good Are Large Language Models at Out-of-Distribution Detection? | |
Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions | |
Can Large Language Models Find And Fix Vulnerable Software? | |
Large Language Models for Software Engineering: A Systematic Literature Review | |
Informed Named Entity Recognition Decoding for Generative Language Models | |
Use of LLMs for Illicit Purposes: Threats, Prevention Measures, and Vulnerabilities | |
Simple is Better and Large is Not Enough: Towards Ensembling of Foundational Language Models | |
Better Zero-Shot Reasoning with Role-Play Prompting | |
Exploring Equation as a Better Intermediate Meaning Representation for Numerical Reasoning | |
Are ChatGPT and GPT-4 Good Poker Players? -- A Pre-Flop Analysis | |
A Survey on Large Language Model based Autonomous Agents | |
Using Large Language Models for Cybersecurity Capture-The-Flag Challenges and Certification Questions | |
Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models | |
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model | |
Evaluating ChatGPT and GPT-4 for Visual Programming | |
Through the Lens of Core Competency: Survey on Evaluation of Large Language Models | |
D4: Improving LLM Pretraining via Document De-Duplication and Diversification | |
Cabrita: closing the gap for foreign languages | |
GPT-in-the-Loop: Adaptive Decision-Making for Multiagent Systems | |
ProAgent: Building Proactive Cooperative AI with Large Language Models | |
Instruction Position Matters in Sequence Generation with Large Language Models | |
Knowledge-Enhanced Multi-Label Few-Shot Product Attribute-Value Extraction | |
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation | |
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models | |
Large Language Model as Autonomous Decision Maker | |
Large Language Models as Superpositions of Cultural Perspectives | |
Activation Addition: Steering Language Models Without Optimization | |
Enhancing Recommender Systems with Large Language Model Reasoning Graphs | |
GPTEval: A Survey on Assessments of ChatGPT and GPT-4 | |
An Empirical Study on Challenging Math Problem Solving with GPT-4 | |
Forward-Backward Reasoning in Large Language Models for Verification | |
Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights using Generative AI | |
Dynamic Planning with a LLM | |
"Guinea Pig Trials" Utilizing GPT: A Novel Smart Agent-Based Modeling Approach for Studying Firm Competition and Collusion | |
Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models | |
Bridging the Gap: Deciphering Tabular Data Using Large Language Model | |
The Pile: An 800GB Dataset of Diverse Text for Language Modeling | |
Prompting Is Programming: A Query Language for Large Language Models | |
EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models | |
Knowledge Graph Prompting for Multi-Document Question Answering | |
GPT detectors are biased against non-native English writers | |
GradientCoin: A Peer-to-Peer Decentralized Large Language Models | |
RaLLe: A Framework for Developing and Evaluating Retrieval-Augmented Large Language Models | |
IncreLoRA: Incremental Parameter Allocation Method for Parameter-Efficient Fine-tuning | |
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language Models | |
Time Travel in LLMs: Tracing Data Contamination in Large Language Models | |
Can Language Models Learn to Listen? | |
Detecting The Corruption Of Online Questionnaires By Artificial Intelligence | |
Towards an Understanding of Large Language Models in Software Engineering Tasks | |
YaRN: Efficient Context Window Extension of Large Language Models | |
An Examination of the Compositionality of Large Generative Vision-Language Models | |
Company Similarity using Large Language Models | |
LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs | |
Instruction Tuning for Large Language Models: A Survey | |
Language to Rewards for Robotic Skill Synthesis | |
Is There Any Social Principle for LLM-Based Agents? | |
A Study on Robustness and Reliability of Large Language Model Code Generation | |
Leveraging Large Language Models for Pre-trained Recommender Systems | |
Mind vs. Mouth: On Measuring Re-judge Inconsistency of Social Bias in Large Language Models | |
LLaSM: Large Language and Speech Model | |
SpikingBERT: Distilling BERT to Train Spiking Language Models Using Implicit Differentiation | |
DiagGPT: An LLM-based Chatbot with Automatic Topic Management for Task-Oriented Dialogue | |
FoodGPT: A Large Language Model in Food Testing Domain with Incremental Pre-training and Knowledge Graph Prompt | |
ChatEDA: A Large Language Model Powered Autonomous Agent for EDA | |
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework | |
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks | |
Pretraining on the Test Set Is All You Need | |
The AI Revolution in Education: Will AI Replace or Assist Teachers in Higher Education? | |
Reinforced Self-Training (ReST) for Language Modeling | |
Fast Inference from Transformers via Speculative Decoding | |
LoRA: Low-Rank Adaptation of Large Language Models | |
Catalyst Property Prediction with CatBERTa: Unveiling Feature Exploration Strategies through Large Language Models | |
AI Deception: A Survey of Examples, Risks, and Potential Solutions | |
RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback | |
Towards Applying Powerful Large AI Models in Classroom Teaching: Opportunities, Challenges and Prospects | |
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation | |
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | |
Blockwise Parallel Decoding for Deep Autoregressive Models | |
Assigning AI: Seven Approaches for Students, with Prompts | |
Conformal Prediction with Large Language Models for Multi-Choice Question Answering | |
Attention: Marginal Probability is All You Need? | |
Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test | |
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time | |
MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records | |
Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following | |
Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models | |
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models | |
XGen-7B Technical Report | |
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models | |
Can Programming Languages Boost Each Other via Instruction Tuning? | |
The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants | |
Efficient RLHF: Reducing the Memory Usage of PPO | |
Universal Self-adaptive Prompting | |
ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models | |
Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior | |
One Wide Feedforward is All You Need | |
Better Zero-Shot Reasoning with Self-Adaptive Prompting | |
BioCoder: A Benchmark for Bioinformatics Code Generation with Contextual Pragmatic Knowledge | |
DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models | |
Graph of Thoughts: Solving Elaborate Problems with Large Language Models | |
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | |
AnomalyGPT: Detecting Industrial Anomalies using Large Vision-Language Models | |
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning | |
SoTaNa: The Open-Source Software Development Assistant | |
GPT Can Solve Mathematical Problems Without a Calculator | |
Physically Grounded Vision-Language Models for Robotic Manipulation | |
FacTool: Factuality Detection in Generative AI -- A Tool Augmented Framework for Multi-Task and Multi-Domain Scenarios | |
FLM-101B: An Open LLM and How to Train It with $100K Budget | |
LaMDA: Language Models for Dialog Applications | |
LMDX: Language Model-based Document Information Extraction and Localization | |
Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers | |
Do Multilingual Language Models Think Better in English? | |
The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute | |
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wild | |
Textbooks Are All You Need II: phi-1.5 technical report | |
Replacing softmax with ReLU in Vision Transformers | |
Investigating Answerability of LLMs for Long-Form Question Answering | |
Vector Search with OpenAI Embeddings: Lucene Is All You Need | |
The Rise and Potential of Large Language Model Based Agents: A Survey | |
Cure the headache of Transformers via Collinear Constrained Attention | |
Uncovering mesa-optimization algorithms in Transformers | |
Large Language Models for Compiler Optimization | |
CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages | |
Chain-of-Verification Reduces Hallucination in Large Language Models | |
AstroLLaMA: Towards Specialized Foundation Models in Astronomy | |
[WIP] Jailbreak Paradox: The Achilles' Heel of LLMs | |
Compositional Foundation Models for Hierarchical Planning | |
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors in Agents | |
Sparse Autoencoders Find Highly Interpretable Features in Language Models | |
DreamLLM: Synergistic Multimodal Comprehension and Creation | |
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference Using Sorted Fine-Tuning (SoFT) | |
Improving Language Models with Advantage-based Offline Policy Gradients | |
Improving Factuality and Reasoning in Language Models through Multiagent Debate | |
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting | |
BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model | |
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models | |
Are Large Language Model-based Evaluators the Solution to Scaling Up Multilingual Evaluation? | |
Multimodal Foundation Models: From Specialists to General-Purpose Assistants | |
Boolformer: Symbolic Regression of Logic Functions with Transformers | |
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? | |
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models | |
TP-Aware Dequantization | |
LASER: LLM Agent with State-Space Exploration for Web Navigation | |
An Empirical Study of Scaling Instruct-Tuned Large Multimodal Models | |
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs | |
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models | |
Baichuan 2: Open Large-scale Language Models | |
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer | |
Efficient Benchmarking (of Language Models) | |
Context is Environment | |
Analyzing Transformer Dynamics as Movement through Embedding Space | |
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs | |
RMT: Retentive Networks Meet Vision Transformers | |
Stack-and-Delay: a new codebook pattern for music generation | |
Neurons in Large Language Models: Dead, N-gram, Positional | |
Large Language Model for Science: A Study on P vs. NP | |
LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset | |
Data Augmentation for Spoken Language Understanding via Pretrained Language Models | |
Petals: Collaborative Inference and Fine-tuning of Large Models | |
Scaling Laws for Sparsely-Connected Foundation Models | |
Kosmos-2.5: A Multimodal Literate Model | |
PDFTriage: Question Answering over Long, Structured Documents | |
Statistical Rejection Sampling Improves Preference Optimization | |
Stabilizing RLHF through Advantage Model and Selective Rehearsal | |
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset | |
Leveraging Contextual Information for Effective Entity Salience Detection | |
NExT-GPT: Any-to-Any Multimodal LLM | |
Are Emergent Abilities in Large Language Models just In-Context Learning? | |
RACE: Large-scale ReAding Comprehension Dataset From Examinations | |
Large-Scale Automatic Audiobook Creation | |
Recovering from Privacy-Preserving Masking with Large Language Models | |
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts | |
Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations | |
Scaling Clinical Trial Matching Using Large Language Models: A Case Study in Oncology | |
What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning | |
RAIN: Your Language Models Can Align Themselves without Finetuning | |
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale | |
Hypothesis Search: Inductive Reasoning with Language Models | |
Agents: An Open-source Framework for Autonomous Language Agents | |
A Paradigm Shift in Machine Translation: Boosting Translation Performance of Large Language Models | |
Gated recurrent neural networks discover attention | |
Contrastive Decoding Improves Reasoning in Large Language Models | |
Clinical Text Summarization: Adapting Large Language Models Can Outperform Human Experts | |
FIAT: Fusing learning paradigms with Instruction-Accelerated Tuning | |
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models | |
Adapting Large Language Models via Reading Comprehension | |
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention | |
MindAgent: Emergent Gaming Interaction | |
Graph Neural Prompting with Large Language Models | |
Sparks of Artificial General Intelligence: Early experiments with GPT-4 | |
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration | |
Efficient Post-training Quantization with FP8 Formats | |
Taken out of context: On measuring situational awareness in LLMs | |
Jointly Training Large Autoregressive Multimodal Models | |
The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A" | |
Curriculum Learning with Adam: The Devil Is in the Wrong Details | |
OWL: A Large Language Model for IT Operations | |
Faith and Fate: Limits of Transformers on Compositionality | |
CodePlan: Repository-level Coding using LLMs and Planning | |
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers | |
Efficient Memory Management for Large Language Model Serving with PagedAttention | |
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models | |
Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual Tasks | |
SCREWS: A Modular Framework for Reasoning with Revisions | |
Transformer models: an introduction and catalog | |
Small-scale proxies for large-scale Transformer training instabilities | |
Effective Long-Context Scaling of Foundation Models | |
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning | |
Qwen Technical Report | |
Attention Approximates Sparse Distributed Memory | |
Calibrating LLM-Based Evaluator | |
Ambiguity-Aware In-Context Learning with Large Language Models | |
GPT-Fathom: Benchmarking Large Language Models to Decipher the Evolutionary Path towards GPT-4 and Beyond | |
Vision Transformers Need Registers | |
Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic | |
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction | |
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models | |
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models | |
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition | |
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval | |
Language Modeling Is Compression | |
MentalLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models | |
Aligning Large Multimodal Models with Factually Augmented RLHF | |
Large Language Models as Optimizers | |
SlimPajama-DC: Understanding Data Combinations for LLM Training | |
Finite Scalar Quantization: VQ-VAE Made Simple | |
Physics of Language Models: Part 3.2, Knowledge Manipulation | |
Efficient Streaming Language Models with Attention Sinks | |
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) | |
Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution | |
LLM-grounded Video Diffusion Models | |
Enable Language Models to Implicitly Learn Self-Improvement From Data | |
Emergent Analogical Reasoning in Large Language Models | |
RA-DIT: Retrieval-Augmented Dual Instruction Tuning | |
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation | |
Large Language Models Cannot Self-Correct Reasoning Yet | |
SmartPlay : A Benchmark for LLMs as Intelligent Agents | |
Language Models Represent Space and Time | |
Retrieval meets Long Context Large Language Models | |
Borges and AI | |
Can large language models provide useful feedback on research papers? A large-scale empirical analysis | |
Ring Attention with Blockwise Transformers for Near-Infinite Context | |
Can Language Models be Instructed to Protect Personal Information? | |
QuIP: 2-Bit Quantization of Large Language Models With Guarantees | |
Who's Harry Potter? Approximate Unlearning in LLMs | |
Low-Resource Languages Jailbreak GPT-4 | |
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines | |
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning | |
EcoAssistant: Using LLM Assistant More Affordably and Accurately | |
How FaR Are Large Language Models From Agents with Theory-of-Mind? | |
MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning | |
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation | |
FreshLLMs: Refreshing Large Language Models with Search Engine Augmentation | |
HeaP: Hierarchical Policies for Web Actions using LLMs | |
A Long Way to Go: Investigating Length Correlations in RLHF | |
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation | |
Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors | |
Think before you speak: Training Language Models With Pause Tokens | |
Mistral 7B | |
SWE-bench: Can Language Models Resolve Real-World GitHub Issues? | |
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity | |
Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading | |
Reliable, Reproducible, and Really Fast Leaderboards with Evalica | |
RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation | |
Large Language Models can Learn Rules | |
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency | |
Large Language Models Are Zero-Shot Time Series Forecasters | |
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models | |
Learning Interactive Real-World Simulators | |
FireAct: Toward Language Agent Fine-tuning | |
InstructRetro: Instruction Tuning post Retrieval-Augmented Pretraining | |
Text Embeddings Reveal (Almost) As Much As Text | |
EIPE-text: Evaluation-Guided Iterative Plan Extraction for Long-Form Narrative Text Generation | |
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics | |
Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models | |
Lemur: Harmonizing Natural Language and Code for Language Agents | |
LangNav: Language as a Perceptual Representation for Navigation | |
The LAMBADA dataset: Word prediction requiring a broad discourse context | |
Octopus: Embodied Vision-Language Programmer from Environmental Feedback | |
Toward Joint Language Modeling for Speech Units and Text | |
MemGPT: Towards LLMs as Operating Systems | |
A Zero-Shot Language Agent for Computer Control with Structured Reflection | |
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models | |
Piccolo2: General Text Embedding with Multi-task Hybrid Loss Training | |
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules | |
The Consensus Game: Language Model Generation via Equilibrium Search | |
Table-GPT: Table-tuned GPT for Diverse Table Tasks | |
PaLI-3 Vision Language Models: Smaller, Faster, Stronger | |
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens | |
Arbitrary Length Generalization for Addition | |
"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation | |
Deep Learning Scaling is Predictable, Empirically | |
MLQA: Evaluating Cross-lingual Extractive Question Answering | |
OpenAssistant Conversations -- Democratizing Large Language Model Alignment | |
Intersectional Bias in Hate Speech and Abusive Language Datasets | |
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension | |
Reducing malicious use of synthetic media research: Considerations and potential release practices for machine learning | |
AI Ethics Issues in Real World: Evidence from AI Incident Database | |
Robotic Skill Acquisition via Instruction Augmentation with Vision-Language Models | |
BadGPT: Exploring Security Vulnerabilities of ChatGPT via Backdoor Attacks to InstructGPT | |
Measuring Mathematical Problem Solving With the MATH Dataset | |
Can Machines Learn Morality? The Delphi Experiment | |
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions | |
UNKs Everywhere: Adapting Multilingual Language Models to New Scripts | |
AndroidEnv: A Reinforcement Learning Platform for Android | |
Demoting Racial Bias in Hate Speech Detection | |
Social Bias Frames: Reasoning about Social and Power Implications of Language | |
Characterising Bias in Compressed Models | |
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes | |
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback | |
Towards Robust Toxic Content Classification | |
The Challenge of Value Alignment: from Fairer Algorithms to AI Safety | |
Towards Continual Knowledge Learning of Language Models | |
The Pushshift Reddit Dataset | |
Draft, Sketch, and Prove: Guiding Formal Theorem Provers with Informal Proofs | |
Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation | |
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? | |
Anticipating Safety Issues in E2E Conversational AI: Framework and Tooling | |
Build it Break it Fix it for Dialogue Safety: Robustness from Adversarial Human Attack | |
Program Induction by Rationale Generation : Learning to Solve and Explain Algebraic Word Problems | |
What's in the Box? A Preliminary Analysis of Undesirable Content in the Common Crawl Corpus | |
One Epoch Is All You Need | |
Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading | |
Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango | |
Wav2Letter: an End-to-End ConvNet-based Speech Recognition System | |
Plug and Play Language Models: A Simple Approach to Controlled Text Generation | |
NewsQA: A Machine Comprehension Dataset | |
AmbiPun: Generating Humorous Puns with Ambiguous Context | |
Deal or No Deal? End-to-End Learning for Negotiation Dialogues | |
Competition-Level Code Generation with AlphaCode | |
STaR: Bootstrapping Reasoning With Reasoning | |
Efficient Neural Architecture Search via Parameter Sharing | |
Recursively Summarizing Books with Human Feedback | |
Habitat: A Platform for Embodied AI Research | |
Generate & Rank: A Multi-task Framework for Math Word Problems | |
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity | |
Mitigating Statistical Bias within Differentially Private Synthetic Data | |
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning | |
RecGPT: Generative Pre-training for Text-based Recommendation | |
TruthfulQA: Measuring How Models Mimic Human Falsehoods | |
An Empirical Study of Metrics to Measure Representational Harms in Pre-Trained Language Models | |
Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks | |
Controlling Style in Generated Dialogue | |
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation | |
Don't Say What You Don't Know: Improving the Consistency of Abstractive Summarization by Constraining Beam Search | |
Universal and Independent: Multilingual Probing Framework for Exhaustive Model Interpretation and Evaluation | |
DeBERTa: Decoding-enhanced BERT with Disentangled Attention | |
Societal Biases in Language Generation: Progress and Challenges | |
Counterfactual Fairness in Text Classification through Robustness | |
Open-Domain Conversational Agents: Current Progress, Open Problems, and Future Directions | |
Deep Double Descent: Where Bigger Models and More Data Hurt | |
Neural Generation Meets Real People: Towards Emotionally Engaging Mixed-Initiative Conversations | |
InCoder: A Generative Model for Code Infilling and Synthesis | |
Back to the Future: On Potential Histories in NLP | |
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization | |
Sharp Minima Can Generalize For Deep Nets | |
Self-attention Does Not Need $O(n^2)$ Memory | |
Measuring the Carbon Intensity of AI in Cloud Instances | |
SocialIQA: Commonsense Reasoning about Social Interactions | |
Generating Long Sequences with Sparse Transformers | |
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer | |
QAmeleon: Multilingual QA with Only 5 Examples | |
CTRL: A Conditional Transformer Language Model for Controllable Generation | |
Hi, my name is Martha: Using names to measure and mitigate bias in generative dialogue models | |
Generating Fake Cyber Threat Intelligence Using Transformer-Based Models | |
Impact of Pretraining Term Frequencies on Few-Shot Reasoning | |
Is neural language acquisition similar to natural? A chronological probing study | |
Is ChatGPT Good at Search? Investigating Large Language Models as Re-Ranking Agent | |
Buffer Overflow in Mixture of Experts | |
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization | |
Bag of Tricks for Efficient Text Classification | |
Automatic Detection of Machine Generated Text: A Critical Survey | |
Adversarial Training for Large Neural Language Models | |
Diffsound: Discrete Diffusion Model for Text-to-sound Generation | |
TALM: Tool Augmented Language Models | |
Training Language Models with Language Feedback | |
Toxicity in Multilingual Machine Translation at Scale | |
PEER: A Collaborative Language Model | |
On the Multilingual Capabilities of Very Large-Scale English Language Models | |
LLaMA: Open and Efficient Foundation Language Models | |
SECure: A Social and Environmental Certificate for AI Systems | |
Gaussian Error Linear Units (GELUs) | |
RoFormer: Enhanced Transformer with Rotary Position Embedding | |
Measuring Massive Multitask Language Understanding | |
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension | |
To Trust or to Think: Cognitive Forcing Functions Can Reduce Overreliance on AI in AI-assisted Decision-making | |
Leveraging QA Datasets to Improve Generative Data Augmentation | |
Decoupled Weight Decay Regularization | |
A Distributional Approach to Controlled Text Generation | |
Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering | |
The Turking Test: Can Language Models Understand Instructions? | |
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models | |
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation | |
Language Models (Mostly) Know What They Know | |
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned | |
Towards Understanding and Mitigating Social Biases in Language Models | |
Discovering and Categorising Language Biases in Reddit | |
Reducing Sentiment Bias in Language Models via Counterfactual Evaluation | |
Training Verifiers to Solve Math Word Problems | |
The Curse of Recursion: Training on Generated Data Makes Models Forget | |
Compositional Semantic Parsing with Large Language Models | |
Transforming Question Answering Datasets Into Natural Language Inference Datasets | |
Bringing the People Back In: Contesting Benchmark Machine Learning Datasets | |
The Values Encoded in Machine Learning Research | |
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning | |
Semantically-Aligned Equation Generation for Solving and Reasoning Math Word Problems | |
Ethical and social risks of harm from Language Models | |
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems | |
Understanding HTML with Large Language Models | |
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning | |
AudioLM: a Language Modeling Approach to Audio Generation | |
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding | |
Behavior Cloned Transformers are Neurosymbolic Reasoners | |
Adversarial Attacks and Defenses in Images, Graphs and Text: A Review | |
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models | |
Thou shalt not hate: Countering Online Hate Speech | |
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020) | |
Participation is not a Design Fix for Machine Learning | |
Retrieval Augmentation Reduces Hallucination in Conversation | |
Advancing the State of the Art in Open Domain Dialog Systems through the Alexa Prize | |
How Many Data Samples is an Additional Instruction Worth? | |
Toward Trustworthy AI Development: Mechanisms for Supporting Verifiable Claims | |
Crosslingual Generalization through Multitask Finetuning | |
The Curious Case of Neural Text Degeneration | |
UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction | |
VinaLLaMA: LLaMA-based Vietnamese Foundation Model | |
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference | |
Evaluating the Social Impact of Generative AI Systems in Systems and Society | |
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference | |
Towards A Rigorous Science of Interpretable Machine Learning | |
An Analysis of the Automatic Bug Fixing Performance of ChatGPT | |
Investigating Failures of Automatic Translation in the Case of Unambiguous Gender | |
Chat as Expected: Learning to Manipulate Black-box Neural Dialogue Models | |
Defending Against Neural Fake News | |
Analyzing Dynamic Adversarial Training Data in the Limit | |
Criticality in Formal Languages and Statistical Physics | |
Generating Wikipedia by Summarizing Long Sequences | |
Gender Bias in Contextualized Word Embeddings | |
MS MARCO: A Human Generated MAchine Reading COmprehension Dataset | |
Deep Generative Dual Memory Network for Continual Learning | |
ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation | |
Persistent Anti-Muslim Bias in Large Language Models | |
Mirages: On Anthropomorphism in Dialogue Systems | |
Deep Learning for Symbolic Mathematics | |
Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents | |
A Survey On Universal Adversarial Attack | |
Atlas: Few-shot Learning with Retrieval Augmented Language Models | |
StructBERT: Incorporating Language Structures into Pre-training for Deep Language Understanding | |
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning | |
A framework for the extraction of Deep Neural Networks by leveraging public data | |
Recipes for building an open-domain chatbot | |
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent | |
Measuring the Effects of Data Parallelism on Neural Network Training | |
ChatGPT Makes Medicine Easy to Swallow: An Exploratory Case Study on Simplified Radiology Reports | |
Kosmos-G: Generating Images in Context with Multimodal Large Language Models | |
X-SQL: reinforce schema representation with context | |
Constructing Datasets for Multi-hop Reading Comprehension Across Documents | |
FastText.zip: Compressing text classification models | |
The State and Fate of Linguistic Diversity and Inclusion in the NLP World | |
A General Language Assistant as a Laboratory for Alignment | |
Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention | |
Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly | |
Transformer tricks: Precomputing the first layer | |
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms | |
Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech | |
Estimating the Carbon Footprint of BLOOM, a 176B Parameter Language Model | |
Enhancing the Transformer with Explicit Relational Encoding for Math Problem Solving | |
Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection | |
Deep Learning Based Text Classification: A Comprehensive Review | |
Automated Hate Speech Detection and the Problem of Offensive Language | |
Multi-Dimensional Gender Bias Classification | |
Extracting Training Data from Large Language Models | |
ProsocialDialog: A Prosocial Backbone for Conversational Agents | |
Cross-Task Generalization via Natural Language Crowdsourcing Instructions | |
SPLADE-v3: New baselines for SPLADE | |
Learning from the Worst: Dynamically Generated Datasets to Improve Online Hate Detection | |
FlowQA: Grasping Flow in History for Conversational Machine Comprehension | |
Recent Advances towards Safe, Responsible, and Moral Dialogue Systems: A Survey | |
Improving alignment of dialogue agents via targeted human judgements | |
Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing | |
DateLogicQA: Benchmarking Temporal Biases in Large Language Models | |
Explanation in Artificial Intelligence: Insights from the Social Sciences | |
RoBERTa: A Robustly Optimized BERT Pretraining Approach | |
Revealing Persona Biases in Dialogue Systems | |
GeDi: Generative Discriminator Guided Sequence Generation | |
Is ChatGPT better than Human Annotators? Potential and Limitations of ChatGPT in Explaining Implicit Hate Speech | |
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering | |
UL2: Unifying Language Learning Paradigms | |
Self-Instruct: Aligning Language Models with Self-Generated Instructions | |
Evaluating the Underlying Gender Bias in Contextualized Word Embeddings | |
Does Gender Matter? Towards Fairness in Dialogue Systems | |
Energy and Policy Considerations for Deep Learning in NLP | |
Tools Fail: Detecting Silent Errors in Faulty Tools | |
The False Promise of Imitating Proprietary LLMs | |
Directional Bias Amplification | |
Hierarchical Text-Conditional Image Generation with CLIP Latents | |
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection | |
ACUTE-EVAL: Improved Dialogue Evaluation with Optimized Questions and Multi-turn Comparisons | |
Task-aware Retrieval with Instructions | |
Do Prompt-Based Models Really Understand the Meaning of their Prompts? | |
Reading Wikipedia to Answer Open-Domain Questions | |
Supervising Model Attention with Human Explanations for Robust Natural Language Inference | |
Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis | |
Latent Retrieval for Weakly Supervised Open Domain Question Answering | |
Teaching language models to support answers with verified quotes | |
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension | |
MasakhaNER: Named Entity Recognition for African Languages | |
Predicting the Type and Target of Offensive Posts in Social Media | |
Learning to Model Editing Processes | |
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model | |
Rainier: Reinforced Knowledge Introspector for Commonsense Question Answering | |
Zero-Shot Fine-Grained Style Transfer: Leveraging Distributed Continuous Style Representations to Transfer To Unseen Styles | |
Quantifying the Carbon Emissions of Machine Learning | |
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping | |
Chasing Carbon: The Elusive Environmental Footprint of Computing | |
Language Models that Seek for Knowledge: Modular Search & Generation for Dialogue and Prompt Completion | |
Distilling Reasoning Capabilities into Smaller Language Models | |
Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning | |
Scaling Language Models: Methods, Analysis & Insights from Training Gopher | |
CodeNet: A Large-Scale AI for Code Dataset for Learning a Diversity of Coding Tasks | |
WebGPT: Browser-assisted question-answering with human feedback | |
Making Large Language Models Better Reasoners with Step-Aware Verifier | |
Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books | |
SGPT: GPT Sentence Embeddings for Semantic Search | |
Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Arbitrary Textual Style Transfer with Small Language Models | |
Building a Conversational Agent Overnight with Dialogue Self-Play | |
ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks | |
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets | |
A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection | |
Neural Machine Translation of Rare Words with Subword Units | |
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection | |
Queens are Powerful too: Mitigating Gender Bias in Dialogue Generation | |
Tokenisation is NP-Complete | |
TeraPipe: Token-Level Pipeline Parallelism for Training Large-Scale Language Models | |
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge | |
Know What You Don't Know: Unanswerable Questions for SQuAD | |
Longformer: The Long-Document Transformer | |
Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus | |
A Constructive Prediction of the Generalization Error Across Scales | |
Detecting Emergent Intersectional Biases: Contextualized Word Embeddings Contain a Distribution of Human-like Biases | |
KERMIT: Generative Insertion-Based Modeling for Sequences | |
mGPT: Few-Shot Learners Go Multilingual | |
The Natural Language Decathlon: Multitask Learning as Question Answering | |
A Crowd-based Evaluation of Abuse Response Strategies in Conversational Agents | |
A Survey of Race, Racism, and Anti-Racism in NLP | |
Unraveling the Hidden Environmental Impacts of AI Solutions for Environment | |
SemEval-2022 Task 2: Multilingual Idiomaticity Detection and Sentence Embedding | |
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter | |
Improving the Domain Adaptation of Retrieval Augmented Generation (RAG) Models for Open Domain Question Answering | |
Hyperbolic Image-Text Representations | |
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey | |
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models | |
Pretraining Language Models with Human Preferences | |
Racial Disparity in Natural Language Processing: A Case Study of Social Media African-American English | |
MTEB: Massive Text Embedding Benchmark | |
Interscript: A dataset for interactive learning of scripts through error feedback | |
Looped Transformers as Programmable Computers | |
Inner Monologue: Embodied Reasoning through Planning with Language Models | |
No Language Left Behind: Scaling Human-Centered Machine Translation | |
Collaborative Storytelling with Large-scale Neural Language Models | |
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge | |
CONDAQA: A Contrastive Reading Comprehension Dataset for Reasoning about Negation | |
Recipes for Safety in Open-domain Chatbots | |
Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations | |
Pre-Trained Language Models for Interactive Decision-Making | |
Can Large Language Models Really Improve by Self-critiquing Their Own Plans? | |
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding | |
Formal Algorithms for Transformers | |
An Emulator for Fine-Tuning Large Language Models using Small Language Models | |
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context | |
Democratizing Reasoning Ability: Tailored Learning from Large Language Model | |
HellaSwag: Can a Machine Really Finish Your Sentence? | |
Teaching Language Models to Self-Improve through Interactive Demonstrations | |
Ranking LLM-Generated Loop Invariants for Program Verification | |
Approximating Two-Layer Feedforward Networks for Efficient Transformers | |
Question and Answer Test-Train Overlap in Open-Domain Question Answering Datasets | |
When can transformers reason with abstract symbols? | |
HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(ision), LLaVA-1.5, and Other Multi-modality Models | |
Language Models are Few-shot Multilingual Learners | |
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP | |
AutoMix: Automatically Mixing Language Models | |
Let's Synthesize Step by Step: Iterative Dataset Synthesis with Large Language Models by Extrapolating Errors from Small Models | |
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V | |
Pre-trained Summarization Distillation | |
TEQ: Trainable Equivalent Transformation for Quantization of LLMs | |
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning | |
Improving Large Language Model Fine-tuning for Solving Math Problems | |
Language Models are General-Purpose Interfaces | |
Llemma: An Open Language Model For Mathematics | |
Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners | |
Gender Bias in Machine Translation | |
Towards a Human-like Open-Domain Chatbot | |
NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation | |
A Network-based End-to-End Trainable Task-oriented Dialogue System | |
Safe RLHF: Safe Reinforcement Learning from Human Feedback | |
Cloze-driven Pretraining of Self-attention Networks | |
Universal Language Model Fine-tuning for Text Classification | |
OPT: Open Pre-trained Transformer Language Models | |
Towards Zero-Label Language Learning | |
GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems | |
MusicAgent: An AI Agent for Music Understanding and Generation with Large Language Models | |
Learning and Leveraging Verifiers to Improve Planning Capabilities of Pre-trained Language Models | |
Fine-tuned Language Models are Continual Learners | |
3D-GPT: Procedural 3D Modeling with Large Language Models | |
PAL: Program-aided Language Models | |
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning | |
Large Language Models for Software Engineering: Survey and Open Problems | |
Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots | |
Self-critiquing models for assisting human evaluators | |
Towards Understanding Sycophancy in Language Models | |
SALMONN: Towards Generic Hearing Abilities for Large Language Models | |
Finetuned Language Models Are Zero-Shot Learners | |
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them | |
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search | |
Generating Sequences by Learning to Self-Correct | |
The Depth-to-Width Interplay in Self-Attention | |
Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning | |
Internet-augmented language models through few-shot prompting for open-domain question answering | |
GLM-130B: An Open Bilingual Pre-trained Model | |
Three scenarios for continual learning | |
Eureka: Human-Level Reward Design via Coding Large Language Models | |
GPT-NeoX-20B: An Open-Source Autoregressive Language Model | |
An Explanation of In-context Learning as Implicit Bayesian Inference | |
AgentTuning: Enabling Generalized Agent Abilities for LLMs | |
Snapshot Ensembles: Train 1, get M for free | |
Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model | |
On the Planning Abilities of Large Language Models -- A Critical Investigation | |
Efficient Estimation of Word Representations in Vector Space | |
Visualizing the Loss Landscape of Neural Nets | |
Contrastive Preference Learning: Learning from Human Feedback without RL | |
High-Resolution Image Synthesis with Latent Diffusion Models | |
I love your chain mail! Making knights smile in a fantasy game world: Open-domain goal-oriented dialogue agents | |
H2O Open Ecosystem for State-of-the-art Large Language Models | |
Calibrate Before Use: Improving Few-Shot Performance of Language Models | |
All-in-One Image-Grounded Conversational Agents | |
Interactive Task Planning with Language Models | |
Can AI-Generated Text be Reliably Detected? | |
BitNet: Scaling 1-bit Transformers for Large Language Models | |
Scaling Laws for Neural Language Models | |
Self-Refine: Iterative Refinement with Self-Feedback | |
Adversarial Environment Generation for Learning to Navigate the Web | |
Cross-Lingual Language Model Meta-Pretraining | |
Creative Robot Tool Use with Large Language Models | |
Simple and Effective Multi-Paragraph Reading Comprehension | |
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection | |
VeRA: Vector-based Random Matrix Adaptation | |
Open-Ended Learning Leads to Generally Capable Agents | |
Exploring the Boundaries of GPT-4 in Radiology | |
Ensemble-Instruct: Generating Instruction-Tuning Data with a Heterogeneous Mixture of LMs | |
High-Dimensional Continuous Control Using Generalized Advantage Estimation | |
Vision-Language Models are Zero-Shot Reward Models for Reinforcement Learning | |
CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion | |
Eliciting Human Preferences with Language Models | |
One-Shot Learning from a Demonstration with Hierarchical Latent Language | |
OpenAgents: An Open Platform for Language Agents in the Wild | |
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation | |
Specific versus General Principles for Constitutional AI | |
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality | |
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | |
Task2Vec: Task Embedding for Meta-Learning | |
Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams | |
Tuna: Instruction Tuning using Feedback from Large Language Models | |
In-Context Pretraining: Language Modeling Beyond Document Boundaries | |
Self-Consistency Improves Chain of Thought Reasoning in Language Models | |
Transcending Scaling Laws with 0.1% Extra Compute | |
InstructExcel: A Benchmark for Natural Language Instruction in Excel | |
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing | |
Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning | |
A Downsampled Variant of ImageNet as an Alternative to the CIFAR datasets | |
Understanding Retrieval Augmentation for Long-Form Question Answering | |
A Neural Conversational Model | |
Exploring the Limits of Language Modeling | |
Scaling Instruction-Finetuned Language Models | |
Learning Performance-Improving Code Edits | |
Training Compute-Optimal Large Language Models | |
Instruction Tuning with GPT-4 | |
Holistic Evaluation of Language Models | |
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models | |
Large Language Models as Analogical Reasoners | |
Negative Training for Neural Dialogue Response Generation | |
On the Opportunities and Risks of Foundation Models | |
Dissecting In-Context Learning of Translations in GPTs | |
Carbon Emissions and Large Neural Network Training | |
Faithful Reasoning Using Large Language Models | |
Detecting Pretraining Data from Large Language Models | |
Motif: Intrinsic Motivation from Artificial Intelligence Feedback | |
Unified Language Model Pre-training for Natural Language Understanding and Generation | |
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model | |
Predictability and Surprise in Large Generative Models | |
Alignment of Language Agents | |
Zephyr: Direct Distillation of LM Alignment | |
Binding Language Models in Symbolic Languages | |
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | |
The Evolved Transformer | |
Detecting Hate Speech with GPT-3 | |
Learning to summarize from human feedback | |
Efficient Large Scale Language Modeling with Mixtures of Experts | |
Jailbreaking Black Box Large Language Models in Twenty Queries | |
How do Language Models Bind Entities in Context? | |
Program Synthesis with Large Language Models | |
Challenges in Detoxifying Language Models | |
A Deep Reinforced Model for Abstractive Summarization | |
Moral Foundations of Large Language Models | |
Training Production Language Models without Memorizing User Data | |
A Deep Reinforcement Learning Chatbot | |
RT-1: Robotics Transformer for Real-World Control at Scale | |
Entity Tracking in Language Models | |
KITAB: Evaluating LLMs on Constraint Satisfaction for Information Retrieval | |
Controlled Decoding from Language Models | |
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models | |
FP8-LM: Training FP8 Large Language Models | |
The Perils & Promises of Fact-checking with Large Language Models | |
Imitation versus Innovation: What children can do that large language and language-and-vision models cannot (yet)? | |
Unsolved Problems in ML Safety | |
Woodpecker: Hallucination Correction for Multimodal Large Language Models | |
A Framework for Automated Measurement of Responsible AI Harms in Generative AI Applications | |
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time | |
Data-Centric Financial Large Language Models | |
CodeFusion: A Pre-trained Diffusion Model for Code Generation | |
TRAMS: Training-free Memory Selection for Long-range Language Modeling | |
Personas as a Way to Model Truthfulness in Language Models | |
PockEngine: Sparse and Efficient Fine-tuning in a Pocket | |
LLM-FP4: 4-Bit Floating-Point Quantized Transformers | |
CLEX: Continuous Length Extrapolation for Large Language Models | |
ALCUNA: Large Language Models Meet New Knowledge | |
JudgeLM: Fine-tuned Large Language Models are Scalable Judges | |
Large Language Models as Generalizable Policies for Embodied Tasks | |
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers | |
ControlLLM: Augment Language Models with Tools by Searching on Graphs | |
Linear Representations of Sentiment in Large Language Models | |
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B | |
The Generative AI Paradox: "What It Can Create, It May Not Understand" | |
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving | |
MM-VID: Advancing Video Understanding with GPT-4V(ision) | |
ChatCoder: Chat-based Refine Requirement Improves LLMs' Code Generation | |
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V | |
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing | |
ChipNeMo: Domain-Adapted LLMs for Chip Design | |
What's In My Big Data? | |
Multitasking Models are Robust to Structural Failure: A Neural Model for Bilingual Cognitive Reserve | |
Idempotent Generative Network | |
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning | |
Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation | |
Leveraging Word Guessing Games to Assess the Intelligence of Large Language Models | |
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans? | |
TeacherLM: Teaching to Fish Rather Than Giving the Fish, Language Modeling Likewise | |
NEFTune: Noisy Embeddings Improve Instruction Finetuning | |
The Impact of Depth and Width on Transformer Language Model Generalization | |
FlashDecoding++: Faster Large Language Model Inference on GPUs | |
Skywork: A More Open Bilingual Foundation Model | |
GRIM: GRaph-based Interactive narrative visualization for gaMes | |
LoRAShear: Efficient Large Language Model Structured Pruning and Knowledge Recovery | |
Does GPT-4 Pass the Turing Test? | |
Text Rendering Strategies for Pixel Language Models | |
Distil-Whisper: Robust Knowledge Distillation via Large-Scale Pseudo Labelling | |
Learning From Mistakes Makes LLM Better Reasoner | |
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning | |
Everything of Thoughts: Defying the Law of Penrose Triangle for Thought Generation | |
Ultra-Long Sequence Distributed Transformer | |
Ziya2: Data-centric Learning is All LLMs Need | |
GLaMM: Pixel Grounding Large Multimodal Model | |
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration | |
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving | |
Unveiling Safety Vulnerabilities of Large Language Models | |
Prompt Cache: Modular Attention Reuse for Low-Latency Inference | |
Levels of AGI: Operationalizing Progress on the Path to AGI | |
u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model | |
Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning | |
Co-training and Co-distillation for Quality Improvement and Compression of Language Models | |
CogVLM: Visual Expert for Pretrained Language Models | |
Tailoring Self-Rationalizers with Multi-Reward Distillation | |
NExT-Chat: An LMM for Chat, Detection and Segmentation | |
The Efficiency Misnomer | |
PPTC Benchmark: Evaluating Large Language Models for PowerPoint Task Completion | |
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs | |
Training Dynamics of Contextual N-Grams in Language Models | |
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents | |
Large Language Models Understand and Can be Enhanced by Emotional Stimuli | |
Gzip versus bag-of-words for text classification | |
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models | |
GPT4All: An Ecosystem of Open Source Compressed Language Models | |
Evaluating Large Language Models: A Comprehensive Survey | |
Leveraging Large Language Models for Automated Proof Synthesis in Rust | |
GPTScore: Evaluate as You Desire | |
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding | |
S-LoRA: Serving Thousands of Concurrent LoRA Adapters | |
Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency | |
Finding Neurons in a Haystack: Case Studies with Sparse Probing | |
Simple and Controllable Music Generation | |
Can LLMs Follow Simple Rules? | |
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM | |
Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models | |
MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning | |
Memory Augmented Language Models through Mixture of Word Experts | |
Language Models can be Logical Solvers | |
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models | |
ADaPT: As-Needed Decomposition and Planning with Language Models | |
FinGPT: Large Generative Models for a Small Language | |
Simplifying Transformer Blocks | |
Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs | |
Prompt Engineering a Prompt Engineer | |
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions | |
Rephrase and Respond: Let Large Language Models Ask Better Questions for Themselves | |
Accelerating Large Language Model Decoding with Speculative Sampling | |
Alternating Updates for Efficient Transformers | |
White-Box Transformers via Sparse Rate Reduction | |
ChatAnything: Facetime Chat with LLM-Enhanced Personas | |
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data | |
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4 | |
LayoutPrompter: Awaken the Design Ability of Large Language Models | |
Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations | |
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation | |
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning | |
Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text | |
SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models | |
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models | |
Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer | |
Trusted Source Alignment in Large Language Models | |
UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations | |
MEGAVERSE: Benchmarking Large Language Models Across Languages, Modalities, Models and Tasks | |
Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5? | |
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster | |
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure | |
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models | |
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming | |
The ART of LLM Refinement: Ask, Refine, and Trust | |
Fine-tuning Language Models for Factuality | |
A Survey on Language Models for Code | |
DiLoCo: Distributed Low-Communication Training of Language Models | |
ML-Bench: Large Language Models Leverage Open-source Libraries for Machine Learning Tasks | |
Fusion-Eval: Integrating Evaluators with LLMs | |
PEARL: Personalizing Large Language Model Writing Assistants with Generation-Calibrated Retrievers | |
SiRA: Sparse Mixture of Low Rank Adaptation | |
Open-Sourcing Highly Capable Foundation Models: An evaluation of risks, benefits, and alternative methods for pursuing open-source objectives | |
Llamas Know What GPTs Don't Show: Surrogate Models for Confidence Estimation | |
UT5: Pretraining Non autoregressive T5 with unrolled denoising | |
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models | |
Tied-Lora: Enhacing parameter efficiency of LoRA with weight tying | |
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models | |
Contrastive Chain-of-Thought Prompting | |
Learning to Filter Context for Retrieval-Augmented Generation | |
Large Language Models for Automated Open-domain Scientific Hypotheses Discovery | |
M$^{2}$UGen: Multi-modal Music Understanding and Generation with the Power of Large Language Models | |
System 2 Attention (is something you might need too) | |
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration | |
Language Models are Multilingual Chain-of-Thought Reasoners | |
ProAgent: From Robotic Process Automation to Agentic Process Automation | |
Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers | |
Exponentially Faster Language Modelling | |
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 | |
ToolTalk: Evaluating Tool-Usage in a Conversational Setting | |
Testing Language Model Agents Safely in the Wild | |
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort | |
MultiLoRA: Democratizing LoRA for Better Multi-Task Learning | |
Orca 2: Teaching Small Language Models How to Reason | |
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections | |
On Leakage of Code Generation Evaluation Datasets | |
GPQA: A Graduate-Level Google-Proof Q&A Benchmark | |
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection | |
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning | |
SelfEval: Leveraging the discriminative nature of generative models for evaluation | |
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems | |
UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework | |
LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores | |
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models | |
HiPPO: Recurrent Memory with Optimal Polynomial Projections | |
Transformer Memory as a Differentiable Search Index | |
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators | |
DeiT III: Revenge of the ViT | |
Scaling Vision Transformers to 22 Billion Parameters | |
On Calibration of Modern Neural Networks | |
A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks | |
MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers | |
Attention Is All You Need | |
Acceleration via Fractal Learning Rate Schedules | |
Transformers learn in-context by gradient descent | |
HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models | |
Toy Models of Superposition | |
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis | |
Unified Scaling Laws for Routed Language Models | |
CLIPPO: Image-and-Language Understanding from Pixels Only | |
Task-Specific Skill Localization in Fine-tuned Language Models | |
Discovering Latent Knowledge in Language Models Without Supervision | |
OCR-free Document Understanding Transformer | |
Language Models are Few-Shot Learners | |
Progress measures for grokking via mechanistic interpretability | |
Learning Transferable Visual Models From Natural Language Supervision | |
Zero-Shot Text-to-Image Generation | |
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models | |
muNet: Evolving Pretrained Deep Neural Networks into Scalable Auto-tuning Multitask Systems | |
Language Models as Agent Models | |
Learning Models of Individual Behavior in Chess | |
Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning | |
Ask Me Anything: A simple strategy for prompting language models | |
Training language models to follow instructions with human feedback | |
Sequence to Sequence Learning with Neural Networks | |
SegGPT: Segmenting Everything In Context | |
A data-driven approach for learning to control computers | |
Lossless Adaptation of Pretrained Vision Models For Robotic Manipulation | |
Unifying Vision, Text, and Layout for Universal Document Processing | |
Memorizing Transformers | |
GateLoop: Fully Data-Controlled Linear Recurrence for Sequence Modeling | |
Beyond Memorization: Violating Privacy Via Inference with Large Language Models | |
A Succinct Summary of Reinforcement Learning | |
Symbolic Discovery of Optimization Algorithms | |
Confronting Reward Model Overoptimization with Constrained RLHF | |
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation | |
A Cookbook of Self-Supervised Learning | |
Training Language Models with Language Feedback at Scale | |
Answering Questions by Meta-Reasoning over Multiple Chains of Thought | |
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment | |
SemDeDup: Data-efficient learning at web-scale through semantic deduplication | |
Adversarial Examples for Evaluating Reading Comprehension Systems | |
Counterfactual Interventions Reveal the Causal Effect of Relative Clause Representations on Agreement Prediction | |
Demonstrate-Search-Predict: Composing retrieval and language models for knowledge-intensive NLP | |
LQ-LoRA: Low-rank Plus Quantized Matrix Decomposition for Efficient Language Model Finetuning | |
ImageBind: One Embedding Space To Bind Them All | |
Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks | |
Scaling Data-Constrained Language Models | |
Efficient LLM Inference on CPUs | |
Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models | |
Efficiently Scaling Transformer Inference | |
One Model To Learn Them All | |
Brain decoding: toward real-time reconstruction of visual perception | |
GLU Variants Improve Transformer | |
Vision Transformers with Mixed-Resolution Tokenization | |
HyperNetworks | |
InRank: Incremental Low-Rank Learning | |
Text-to-Image Diffusion Models are Zero-Shot Classifiers | |
CoBIT: A Contrastive Bi-directional Image-Text Generation Model | |
MAGVLT: Masked Generative Vision-and-Language Transformer | |
DINOv2: Learning Robust Visual Features without Supervision | |
What learning algorithm is in-context learning? Investigations with linear models | |
Any-to-Any Generation via Composable Diffusion | |
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints | |
Shortformer: Better Language Modeling using Shorter Inputs | |
Real-Time Evaluation Models for RAG: Who Detects Hallucinations Best? | |
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity | |
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture | |
PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
The alignment problem from a deep learning perspective | |
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts | |
Jailbreaking is Best Solved by Definition | |
Multimodal Analogical Reasoning over Knowledge Graphs | |
Segment Everything Everywhere All at Once | |
DocPrompting: Generating Code by Retrieving the Docs | |
Emergent Tool Use From Multi-Agent Autocurricula | |
Root Mean Square Layer Normalization | |
TeCH: Text-guided Reconstruction of Lifelike Clothed Humans | |
Efficient Training of Language Models to Fill in the Middle | |
AI for Mathematics: A Cognitive Science Perspective | |
AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators | |
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? | |
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks | |
The First Room-Temperature Ambient-Pressure Superconductor | |
Segment Anything | |
Less is More: Parameter-Free Text Classification with Gzip | |
Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions | |
A Generalist Agent | |
Meet in the Middle: A New Pre-training Paradigm | |
Learning Fast Algorithms for Linear Transforms Using Butterfly Factorizations | |
Can Humans Do Less-Than-One-Shot Learning? | |
Diffusion-LM Improves Controllable Text Generation | |
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking | |
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets | |
Text-to-3D using Gaussian Splatting | |
Precise Zero-Shot Dense Retrieval without Relevance Labels | |
Brainformers: Trading Simplicity for Efficiency | |
DETRs Beat YOLOs on Real-time Object Detection | |
OtterHD: A High-Resolution Multi-modality Model | |
Rethinking the Role of Token Retrieval in Multi-Vector Retrieval | |
ConvNets Match Vision Transformers at Scale | |
Domain Specific Question Answering Over Knowledge Graphs Using Logical Programming and Large Language Models | |
Scaling Robot Learning with Semantically Imagined Experience | |
Do LLMs exhibit human-like response biases? A case study in survey design | |
READ: Recurrent Adaptation of Large Transformers | |
Benchmarking Neural Network Training Algorithms | |
Automatic Gradient Descent: Deep Learning without Hyperparameters | |
Layer Normalization | |
An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion | |
Implicit Representations of Meaning in Neural Language Models | |
Calibrated Chaos: Variance Between Runs of Neural Network Training is Harmless and Inevitable | |
SqueezeLLM: Dense-and-Sparse Quantization | |
Optimisation & Generalisation in Networks of Neurons | |
Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals | |
Transformers as Recognizers of Formal Languages: A Survey on Expressivity | |
The effectiveness of MAE pre-pretraining for billion-scale pretraining | |
Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks | |
Decoupled Context Processing for Context Augmented Language Modeling | |
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | |
The Transient Nature of Emergent In-Context Learning in Transformers | |
Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning | |
Matryoshka Diffusion Models | |
Show Your Work: Scratchpads for Intermediate Computation with Language Models | |
Beyond neural scaling laws: beating power law scaling via data pruning | |
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? | |
Going Deeper with Convolutions | |
TimeGPT-1 | |
Capabilities of GPT-4 on Medical Challenge Problems | |
Training Large Language Models Efficiently with Sparsity and Dataflow | |
Optimal Policies Tend to Seek Power | |
A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity | |
Thinking Like Transformers | |
Why think step by step? Reasoning emerges from the locality of experience | |
Mixture-of-Experts with Expert Choice Routing | |
GPT-4 Technical Report | |
Scaling Expert Language Models with Unsupervised Domain Discovery | |
End-to-End Spatio-Temporal Action Localisation with Video Transformers | |
Mass-Editing Memory in a Transformer | |
Erasing Concepts from Diffusion Models | |
Physics of Language Models: Part 1, Context-Free Grammar | |
Flamingo: a Visual Language Model for Few-Shot Learning | |
Towards CausalGPT: A Multi-Agent Approach for Faithful Knowledge Reasoning via Promoting Causal Consistency in LLMs | |
Semantic Tokenizer for Enhanced Natural Language Processing | |
On Limitations of the Transformer Architecture | |
A Survey of Large Language Models | |
Affordances from Human Videos as a Versatile Representation for Robotics | |
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale | |
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution | |
Conditioning Predictive Models: Risks and Strategies | |
Implicit Chain of Thought Reasoning via Knowledge Distillation | |
Scaling Laws for Transfer | |
Risks from Learned Optimization in Advanced Machine Learning Systems | |
SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression | |
Bayesian Optimization of Catalysts With In-context Learning | |
Teach LLMs to Phish: Stealing Private Information from Language Models | |
LLMatic: Neural Architecture Search via Large Language Models and Quality Diversity Optimization | |
Knowledge Graphs | |
Language Modelling with Pixels | |
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization | |
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning | |
Chinchilla Scaling: A replication attempt | |
Retrofitting Word Vectors to Semantic Lexicons | |
CoLT5: Faster Long-Range Transformers with Conditional Computation | |
Deep contextualized word representations | |
Boosted Prompt Ensembles for Large Language Models | |
Recurrent Memory Transformer | |
Multitask Prompted Training Enables Zero-Shot Task Generalization | |
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs | |
Monarch: Expressive Structured Matrices for Efficient and Accurate Training | |
On the Turing Completeness of Modern Neural Network Architectures | |
Generalized Out-of-Distribution Detection: A Survey | |
AugGPT: Leveraging ChatGPT for Text Data Augmentation | |
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism | |
SLiC-HF: Sequence Likelihood Calibration with Human Feedback | |
StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models | |
Drag Your GAN: Interactive Point-based Manipulation on the Generative Image Manifold | |
Human-Timescale Adaptation in an Open-Ended Task Space | |
Sigmoid Loss for Language Image Pre-Training | |
OpenScene: 3D Scene Understanding with Open Vocabularies | |
Nougat: Neural Optical Understanding for Academic Documents | |
SoundStorm: Efficient Parallel Audio Generation | |
Text and Code Embeddings by Contrastive Pre-Training | |
Fine-Tuning Language Models from Human Preferences | |
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT | |
Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models | |
Effective Theory of Transformers at Initialization | |
ST-MoE: Designing Stable and Transferable Sparse Expert Models | |
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale | |
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models | |
Natural Selection Favors AIs over Humans | |
ART: Automatic multi-step reasoning and tool-use for large language models | |
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection | |
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models | |
Visual Instruction Tuning | |
Efficiently Modeling Long Sequences with Structured State Spaces | |
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges | |
Mastering Diverse Domains through World Models | |
Simplified State Space Layers for Sequence Modeling | |
Offline RL for Natural Language Generation with Implicit Language Q Learning | |
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond | |
Deduplicating Training Data Mitigates Privacy Risks in Language Models | |
Self-supervised Learning: Generative or Contrastive | |
Towards Automated Circuit Discovery for Mechanistic Interpretability | |
Neural Story Planning | |
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training | |
Vera: A General-Purpose Plausibility Estimation Model for Commonsense Statements | |
Dota 2 with Large Scale Deep Reinforcement Learning | |
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability | |
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head | |
The Matrix Calculus You Need For Deep Learning | |
ChatGPT is a Knowledgeable but Inexperienced Solver: An Investigation of Commonsense Problem in Large Language Models | |
DeepNet: Scaling Transformers to 1,000 Layers | |
SparseFormer: Sparse Visual Recognition via Limited Latent Tokens | |
Whose Language Counts as High Quality? Measuring Language Ideologies in Text Data Selection | |
LLMs cannot find reasoning errors, but can correct them! | |
Pretraining Without Attention | |
Large language models are not zero-shot communicators | |
Semi-supervised Sequence Learning | |
Improving language models by retrieving from trillions of tokens | |
Synthetic Data from Diffusion Models Improves ImageNet Classification | |
Level Generation Through Large Language Models | |
How Does Generative Retrieval Scale to Millions of Passages? | |
State Spaces Aren't Enough: Machine Translation Needs Attention | |
Data Distributional Properties Drive Emergent In-Context Learning in Transformers | |
Evaluating Large Language Models Trained on Code | |
Injecting structural hints: Using language models to study inductive biases in language learning | |
The case for 4-bit precision: k-bit Inference Scaling Laws | |
Divide-or-Conquer? Which Part Should You Distill Your LLM? | |
Downstream Datasets Make Surprisingly Good Pretraining Corpora | |
ChatGPT or Grammarly? Evaluating ChatGPT on Grammatical Error Correction Benchmark | |
Fast Transformer Decoding: One Write-Head is All You Need | |
NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities | |
Towards Deep Learning Models Resistant to Adversarial Attacks | |
A Practical Deep Learning-Based Acoustic Side Channel Attack on Keyboards | |
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok | |
Large Language Models as General Pattern Machines | |
Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models | |
Fast and forward stable randomized algorithms for linear least-squares problems | |
Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training | |
Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models | |
Twist Decoding: Diverse Generators Guide Each Other | |
Monolith: Real Time Recommendation System With Collisionless Embedding Table | |
On-Device Training Under 256KB Memory | |
Meta-Learning in Neural Networks: A Survey | |
The Linear Representation Hypothesis and the Geometry of Large Language Models | |
The Power of Scale for Parameter-Efficient Prompt Tuning | |
LongForm: Optimizing Instruction Tuning for Long Text Generation with Corpus Extraction | |
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention | |
Why Can GPT Learn In-Context? Language Models Implicitly Perform Gradient Descent as Meta-Optimizers | |
GLM: General Language Model Pretraining with Autoregressive Blank Infilling | |
Human Preference Score: Better Aligning Text-to-Image Models with Human Preference | |
Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning | |
Spreading vectors for similarity search | |
REFINER: Reasoning Feedback on Intermediate Representations | |
Learning to Learn Faster from Human Feedback with Language Model Predictive Control | |
Low-code LLM: Visual Programming over LLMs | |
Decoding speech perception from non-invasive brain recordings | |
Towards Agile Text Classifiers for Everyone | |
Cramming: Training a Language Model on a Single GPU in One Day | |
Text-to-Table: A New Way of Information Extraction | |
TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP | |
WizardLM: Empowering Large Language Models to Follow Complex Instructions | |
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints | |
ViperGPT: Visual Inference via Python Execution for Reasoning | |
Spatial-Language Attention Policies for Efficient Robot Learning | |
Improved Baselines with Visual Instruction Tuning | |
Decision Transformer: Reinforcement Learning via Sequence Modeling | |
What Algorithms can Transformers Learn? A Study in Length Generalization | |
Tracking Everything Everywhere All at Once | |
Bad Global Minima Exist and SGD Can Reach Them | |
Directly Fine-Tuning Diffusion Models on Differentiable Rewards | |
Fine-Tuning LLaMA for Multi-Stage Text Retrieval | |
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks | |
EVA-CLIP: Improved Training Techniques for CLIP at Scale | |
Optimizing Memory Mapping Using Deep Reinforcement Learning | |
A General Theoretical Paradigm to Understand Learning from Human Preferences | |
Beyond Words: A Comprehensive Survey of Sentence Representations | |
Black-Box Prompt Optimization: Aligning Large Language Models without Model Training | |
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought | |
Adding Gradient Noise Improves Learning for Very Deep Networks | |
Positional Description Matters for Transformers Arithmetic | |
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up? | |
Calibrated Language Models Must Hallucinate | |
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks | |
Phenomenal Yet Puzzling: Testing Inductive Reasoning Capabilities of Language Models with Hypothesis Refinement | |
Online Decision Transformer | |
Benchmarking Large Language Models for News Summarization | |
Overthinking the Truth: Understanding how Language Models Process False Demonstrations | |
Scalable Extraction of Training Data from (Production) Language Models | |
White-Box Transformers via Sparse Rate Reduction: Compression Is All There Is? | |
Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model | |
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization | |
ComPEFT: Compression for Communicating Parameter Efficient Updates via Sparsification and Quantization | |
Visual In-Context Prompting | |
PG-Video-LLaVA: Pixel Grounding Large Video-Language Models | |
GAIA: a benchmark for General AI Assistants | |
More is Better in Modern Machine Learning: when Infinite Overparameterization is Optimal and Overfitting is Obligatory | |
Generative agent-based modeling with actions grounded in physical, social, or digital space using Concordia | |
Unnatural Error Correction: GPT-4 Can Almost Perfectly Handle Unnatural Scrambled Text | |
Chain-of-Thought Reasoning is a Policy Improvement Operator | |
Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine | |
Thinking Fast and Slow in Large Language Models | |
Towards Accurate Differential Diagnosis with Large Language Models | |
Mamba: Linear-Time Sequence Modeling with Selective State Spaces | |
Vanishing Gradients in Reinforcement Finetuning of Language Models | |
The History and Risks of Reinforcement Learning and Human Feedback | |
Adapters: A Unified Library for Parameter-Efficient and Modular Transfer Learning | |
Video Language Planning | |
Thread of Thought Unraveling Chaotic Contexts | |
PaSS: Parallel Speculative Sampling | |
SeaLLMs -- Large Language Models for Southeast Asia | |
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models | |
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models | |
An LLM Compiler for Parallel Function Calling | |
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation | |
WinoGrande: An Adversarial Winograd Schema Challenge at Scale | |
Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey | |
Magicoder: Source Code Is All You Need | |
SILC: Improving Vision Language Pretraining with Self-Distillation | |
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models | |
RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback | |
TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents | |
An Early Evaluation of GPT-4V(ision) | |
Farzi Data: Autoregressive Data Distillation | |
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models | |
One Embedder, Any Task: Instruction-Finetuned Text Embeddings | |
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents | |
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want | |
Towards a Unified View of Parameter-Efficient Transfer Learning | |
Beyond Surface: Probing LLaMA Across Scales and Layers | |
TiC-CLIP: Continual Training of CLIP Models | |
GPT4Point: A Unified Framework for Point-Language Understanding and Generation | |
GOAT: GO to Any Thing | |
Nash Learning from Human Feedback | |
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs | |
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency | |
Axiomatic Preference Modeling for Longform Question Answering | |
FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling | |
Efficient Monotonic Multihead Attention | |
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings | |
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena | |
Are LLMs Useful in the Poorest Schools? theTeacherAI in Sierra Leone | |
De-Diffusion Makes Text a Strong Cross-Modal Interface | |
Dolphins: Multimodal Language Model for Driving | |
MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture | |
Efficient Transformer Knowledge Distillation: A Performance Review | |
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs | |
Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments | |
Instruction-tuning Aligns LLMs to the Human Brain | |
Large Language Model Alignment: A Survey | |
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities | |
RoboVQA: Multimodal Long-Horizon Reasoning for Robotics | |
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models | |
GraphDreamer: Compositional 3D Scene Synthesis from Scene Graphs | |
Instruction-Following Evaluation for Large Language Models | |
ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs | |
Pre-Training to Learn in Context | |
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks | |
Large Language Models for Mathematicians | |
WhisBERT: Multimodal Text-Audio Language Modeling on 100M Words | |
Language Model Inversion | |
Training Chain-of-Thought via Latent-Variable Inference | |
The Quantization Model of Neural Scaling | |
Beyond ChatBots: ExploreLLM for Structured Thoughts and Personalized Model Responses | |
TinyGSM: achieving >80% on GSM8k with small language models | |
Context Tuning for Retrieval Augmented Generation | |
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning | |
TigerBot: An Open Multilingual Multitask LLM | |
PromptBench: A Unified Library for Evaluation of Large Language Models | |
Generating Fine-Grained Human Motions Using ChatGPT-Refined Descriptions | |
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models | |
Challenges with unsupervised LLM knowledge discovery | |
A Survey of Large Language Models in Medicine: Principles, Applications, and Challenges | |
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning | |
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision | |
Honeybee: Locality-enhanced Projector for Multimodal LLM | |
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation | |
ProTIP: Progressive Tool Retrieval Improves Planning | |
Catwalk: A Unified Language Model Evaluation Framework for Many Datasets | |
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models | |
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding | |
FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection | |
Unlocking Anticipatory Text Generation: A Constrained Approach for Faithful Decoding with Large Language Models | |
SparQ Attention: Bandwidth-Efficient LLM Inference | |
Silkie: Preference Distillation for Large Visual Language Models | |
Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models | |
Algorithmic Collusion by Large Language Models | |
Mathematical Language Models: A Survey | |
Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention | |
FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects | |
Federated Full-Parameter Tuning of Billion-Sized Language Models with Communication Cost under 18 Kilobytes | |
Pixel Aligned Language Models | |
PathFinder: Guided Search over Multi-Step Reasoning Paths | |
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models | |
Vision-Language Models as a Source of Rewards | |
Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations | |
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3" | |
Language-Informed Visual Concept Learning | |
Evaluation of Large Language Models for Decision Making in Autonomous Driving | |
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent | |
Extending Context Window of Large Language Models via Semantic Compression | |
A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions | |
Formal Aspects of Language Modeling | |
Large Language Models on Graphs: A Comprehensive Survey | |
Merlin:Empowering Multimodal LLMs with Foresight Minds | |
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey | |
"I Want It That Way": Enabling Interactive Decision Support Using Large Language Models and Constraint Programming | |
Generating Illustrated Instructions | |
Alignment for Honesty | |
Paloma: A Benchmark for Evaluating Language Model Fit | |
Self-Evaluation Improves Selective Generation in Large Language Models | |
Nomic Embed: Training a Reproducible Long Context Text Embedder | |
Rejuvenating image-GPT as Strong Visual Representation Learners | |
Object Recognition as Next Token Prediction | |
Foundation Models in Robotics: Applications, Challenges, and the Future | |
Distributed Inference and Fine-tuning of Large Language Models Over The Internet | |
LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning | |
Data Management For Large Language Models: A Survey | |
AtP*: An efficient and scalable method for localizing LLM behaviour to components | |
Knowledge Distillation of Large Language Models | |
Faithful Persona-based Conversational Dataset Generation with Large Language Models | |
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze! | |
ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks | |
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism | |
Localized Symbolic Knowledge Distillation for Visual Commonsense Models | |
Weight subcloning: direct initialization of transformers using larger pretrained ones | |
Segment and Caption Anything | |
Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation | |
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models | |
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator | |
OneLLM: One Framework to Align All Modalities with Language | |
Steering Llama 2 via Contrastive Activation Addition | |
VILA: On Pre-training for Visual Language Models | |
TIP: Text-Driven Image Processing with Semantic and Restoration Instructions | |
HyperAttention: Long-context Attention in Near-Linear Time | |
LLM360: Towards Fully Transparent Open-Source LLMs | |
Efficient Transformers with Dynamic Token Pooling | |
GIVT: Generative Infinite-Vocabulary Transformers | |
Modeling Context in Referring Expressions | |
The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes | |
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise | |
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model | |
Text-Conditioned Resampler For Long Form Video Understanding | |
Gemini: A Family of Highly Capable Multimodal Models | |
LLMs are Not Just Next Token Predictors | |
LLM in a flash: Efficient Large Language Model Inference with Limited Memory | |
Cascade Speculative Drafting for Even Faster LLM Inference | |
G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model | |
VideoPoet: A Large Language Model for Zero-Shot Video Generation | |
HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models | |
AppAgent: Multimodal Agents as Smartphone Users | |
Time is Encoded in the Weights of Finetuned Language Models | |
Generative Multimodal Models are In-Context Learners | |
Cached Transformers: Improving Transformers with Differentiable Memory Cache | |
Mini-GPTs: Efficient Large Language Models through Contextual Pruning | |
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU | |
An In-depth Look at Gemini's Language Abilities | |
Retrieval-Augmented Generation for Large Language Models: A Survey | |
Intriguing Properties of Quantization at Scale | |
Parrot Captions Teach CLIP to Spot Text | |
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math | |
Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning | |
YAYI 2: Multilingual Open-Source Large Language Models | |
Reasons to Reject? Aligning Language Models with Judgments | |
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation | |
LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding | |
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion | |
Exploiting Novel GPT-4 APIs | |
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks | |
VCoder: Versatile Vision Encoders for Multimodal Large Language Models | |
PreCog: Exploring the Relation between Memorization and Performance in Pre-trained Language Models | |
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases | |
LLM4VG: Large Language Models Evaluation for Video Grounding | |
Shai: A large language model for asset management | |
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation | |
LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment | |
Principled Instructions Are All You Need for Questioning LLaMA-1/2, GPT-3.5/4 | |
Supervised Knowledge Makes Large Language Models Better In-context Learners | |
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling | |
Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases | |
The LLM Surgeon | |
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action | |
MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices | |
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones | |
Task Contamination: Language Models May Not Be Few-Shot Anymore | |
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training | |
Learning Vision from Models Rivals Learning Vision from Data | |
TinyLlama: An Open-Source Small Language Model | |
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models | |
PanGu-$π$: Enhancing Language Model Architectures via Nonlinearity Compensation | |
Making Large Language Models A Better Foundation For Dense Retrieval | |
LARP: Language-Agent Role Play for Open-World Games | |
A Survey of Reasoning with Foundation Models | |
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape | |
Fine-Tuning or Retrieval? Comparing Knowledge Injection in LLMs | |
Comparing Humans, GPT-4, and GPT-4V On Abstraction and Reasoning Tasks | |
Towards the Law of Capacity Gap in Distilling Language Models | |
At Which Training Stage Does Code Data Help LLMs Reasoning? | |
Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve | |
Evaluation of GPT-3.5 and GPT-4 for supporting real-world information needs in healthcare delivery | |
STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition | |
The Lazy Neuron Phenomenon: On Emergence of Activation Sparsity in Transformers | |
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models | |
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning | |
A Comprehensive Study of Knowledge Editing for Large Language Models | |
VideoDrafter: Content-Consistent Multi-Scene Video Generation with LLM | |
Orion-14B: Open-source Multilingual Large Language Models | |
LLaMA Beyond English: An Empirical Study on Language Capability Transfer | |
DocLLM: A layout-aware generative language model for multimodal document understanding | |
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training | |
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents | |
Patchscope: A Unifying Framework for Inspecting Hidden Representations of Language Models | |
Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models | |
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws | |
GeoGalactica: A Scientific Large Language Model in Geoscience | |
Improving Text Embeddings with Large Language Models | |
Boosting Large Language Model for Speech Synthesis: An Empirical Study | |
TrustLLM: Trustworthiness in Large Language Models | |
Unicron: Economizing Self-Healing LLM Training at Scale | |
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining | |
Proving Test Set Contamination in Black Box Language Models | |
LLaMA Pro: Progressive LLaMA with Block Expansion | |
LLM Augmented LLMs: Expanding Capabilities through Composition | |
LLaVA-$φ$: Efficient Multi-Modal Assistant with Small Language Model | |
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers | |
Understanding LLMs: A Comprehensive Overview from Training to Inference | |
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers | |
A Vision Check-up for Language Models | |
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts | |
Multilingual Instruction Tuning With Just a Pinch of Multilinguality | |
WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope | |
GPT-4V(ision) is a Generalist Web Agent, if Grounded | |
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs | |
Mind2Web: Towards a Generalist Agent for the Web | |
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism | |
DocGraphLM: Documental Graph Language Model for Information Extraction | |
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache | |
TOFU: A Task of Fictitious Unlearning for LLMs | |
Transformers are Multi-State RNNs | |
Secrets of RLHF in Large Language Models Part II: Reward Modeling | |
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models | |
Tuning LLMs with Contrastive Alignment Instructions for Machine Translation in Unseen, Low-resource Languages | |
A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism | |
Towards Conversational Diagnostic AI | |
Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training | |
Efficient LLM inference solution on Intel GPU | |
I am a Strange Dataset: Metalinguistic Tests for Language Models | |
Bootstrapping LLM-based Task-Oriented Dialogue Agents via Self-Talk | |
Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models | |
The Impact of Reasoning Step Length on Large Language Models | |
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models | |
Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding | |
Mixtral of Experts | |
ChatQA: Building GPT-4 Level Conversational QA Models | |
TeleChat Technical Report | |
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models | |
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon | |
AST-T5: Structure-Aware Pretraining for Code Generation and Understanding | |
Has Your Pretrained Model Improved? A Multi-head Posterior Based Approach | |
MaLA-500: Massive Language Adaptation of Large Language Models | |
The Unreasonable Effectiveness of Easy Training Data for Hard Tasks | |
Theory of Mind abilities of Large Language Models in Human-Robot Interaction : An Illusion? | |
State of What Art? A Call for Multi-Prompt LLM Evaluation | |
Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting | |
Compressing Context to Enhance Inference Efficiency of Large Language Models | |
Are ChatGPT and GPT-4 General-Purpose Solvers for Financial Text Analytics? A Study on Several Typical Tasks | |
VMamba: Visual State Space Model | |
DiffusionGPT: LLM-Driven Text-to-Image Generation System | |
Self-Rewarding Language Models | |
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model | |
Asynchronous Local-SGD Training for Language Modeling | |
ReFT: Reasoning with Reinforced Fine-Tuning | |
SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers | |
DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference | |
Tuning Language Models by Proxy | |
Scalable Pre-training of Large Autoregressive Image Models | |
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation | |
Extending LLMs' Context Window with 100 Samples | |
E^2-LLM: Efficient and Extreme Length Extension of Large Language Models | |
SPADE: Synthesizing Assertions for Large Language Model Pipelines | |
Foundations of Vector Retrieval | |
Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation | |
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads | |
Evaluating the Moral Beliefs Encoded in LLMs | |
Boosting Theory-of-Mind Performance in Large Language Models via Prompting | |
MambaByte: Token-free Selective State Space Model | |
RakutenAI-7B: Extending Large Language Models for Japanese | |
MM-LLMs: Recent Advances in MultiModal Large Language Models | |
AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents | |
Meta-Prompting: Enhancing Language Models with Task-Agnostic Scaffolding | |
Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study | |
Small Language Model Meets with Reinforced Vision Vocabulary | |
WARM: On the Benefits of Weight Averaged Reward Models | |
In-Context Learning for Extreme Multi-Label Classification | |
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities | |
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text | |
CMMMU: A Chinese Massive Multi-discipline Multimodal Understanding Benchmark | |
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs | |
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models | |
UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion | |
What Are Tools Anyway? A Survey from the Language Model Perspective | |
ConTextual: Evaluating Context-Sensitive Text-Rich Visual Reasoning in Large Multimodal Models | |
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection | |
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-Alignment | |
CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation | |
RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval | |
Mission: Impossible Language Models | |
Benchmarking LLMs via Uncertainty Quantification | |
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models | |
Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering | |
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence | |
H2O-Danube-1.8B Technical Report | |
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design | |
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion | |
Unitxt: Flexible, Shareable and Reusable Data Preparation and Evaluation for Generative AI | |
WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models | |
Representation Engineering: A Top-Down Approach to AI Transparency | |
LongAlign: A Recipe for Long Context Alignment of Large Language Models | |
Scavenging Hyena: Distilling Transformers into Long Convolution Models | |
Efficient Tool Use with Chain-of-Abstraction Reasoning | |
YOLO-World: Real-Time Open-Vocabulary Object Detection | |
Weaver: Foundation Models for Creative Writing | |
Weak-to-Strong Jailbreaking on Large Language Models | |
Transfer Learning for Text Diffusion Models | |
StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis | |
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives | |
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model | |
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling | |
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception | |
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models | |
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty | |
RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture | |
Watermarking Makes Language Models Radioactive | |
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities | |
SliceGPT: Compress Large Language Models by Deleting Rows and Columns | |
Taiyi-Diffusion-XL: Advancing Bilingual Text-to-Image Generation with Large Vision-Language Model Support | |
Generative Expressive Robot Behaviors using Large Language Models | |
Efficient Exploration for LLMs | |
Can Large Language Models Understand Context? | |
SymbolicAI: A framework for logic-based approaches combining generative models and solvers | |
Tiny Titans: Can Smaller Large Language Models Punch Above Their Weight in the Real World for Meeting Summarization? | |
OLMo: Accelerating the Science of Language Models | |
Tree Prompting: Efficient Task Adaptation without Fine-Tuning | |
CroissantLLM: A Truly Bilingual French-English Language Model | |
Health-LLM: Personalized Retrieval-Augmented Disease Prediction Model | |
Transforming and Combining Rewards for Aligning Large Language Models | |
EE-Tuning: An Economical yet Scalable Solution for Tuning Early-Exit Large Language Models | |
Scaling Laws for Downstream Task Performance of Large Language Models | |
Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research | |
Seven Failure Points When Engineering a Retrieval Augmented Generation System | |
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters | |
Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks | |
CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations | |
Multi-line AI-assisted Code Authoring | |
Self-Discover: Large Language Models Self-Compose Reasoning Structures | |
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models | |
Training-Free Consistent Text-to-Image Generation | |
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization | |
Shortened LLaMA: A Simple Depth Pruning for Large Language Models | |
Rethinking Optimization and Architecture for Tiny Language Models | |
LiPO: Listwise Preference Optimization through Learning-to-Rank | |
BlackMamba: Mixture of Experts for State-Space Models | |
Rethinking Interpretability in the Era of Large Language Models | |
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models | |
TravelPlanner: A Benchmark for Real-World Planning with Language Agents | |
K-Level Reasoning with Large Language Models | |
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback | |
PokéLLMon: A Human-Parity Agent for Pokémon Battles with Large Language Models | |
Specialized Language Models with Cheap Inference from Limited Domain Data | |
Repeat After Me: Transformers are Better than State Space Models at Copying | |
A Survey on Hallucination in Large Vision-Language Models | |
Corrective Retrieval Augmented Generation | |
A Comprehensive Survey of Compression Algorithms for Language Models | |
Leveraging Large Language Models for NLG Evaluation: A Survey | |
The Power of Noise: Redefining Retrieval for RAG Systems | |
AgentBoard: An Analytical Evaluation Board of Multi-turn LLM Agents | |
Red Teaming Visual Language Models | |
Knowledge Fusion of Large Language Models | |
A Survey of Resource-efficient LLM and Multimodal Foundation Models | |
Lexinvariant Language Models | |
Noise2Music: Text-conditioned Music Generation with Diffusion Models | |
Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery | |
Mathematical Capabilities of ChatGPT | |
AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation | |
Large Language Models for Mathematical Reasoning: Progresses and Challenges | |
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models | |
Driving Everywhere with Large Language Model Policy Adaptation | |
WebLINX: Real-World Website Navigation with Multi-Turn Dialogue | |
SpiRit-LM: Interleaved Spoken and Written Language Model | |
Multilingual E5 Text Embeddings: A Technical Report | |
In-Context Principle Learning from Mistakes | |
Tag-LLM: Repurposing General-Purpose LLMs for Specialized Domains | |
Hydragen: High-Throughput LLM Inference with Shared Prefixes | |
CodeIt: Self-Improving Language Models with Prioritized Hindsight Replay | |
Fast Timing-Conditioned Latent Audio Diffusion | |
Direct Language Model Alignment from Online AI Feedback | |
Grandmaster-Level Chess Without Search | |
Fine-Tuned Language Models Generate Stable Inorganic Materials as Text | |
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs | |
Tandem Transformers for Inference Efficient LLMs | |
World Model on Million-Length Video And Language With RingAttention | |
Lumos : Empowering Multimodal LLMs with Scene Text Recognition | |
Suppressing Pink Elephants with Direct Principle Feedback | |
Policy Improvement using Language Feedback Models | |
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs | |
LokiLM: Technical Report | |
Scaling Laws for Fine-Grained Mixture of Experts | |
Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models | |
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model | |
AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts | |
Step-On-Feet Tuning: Scaling Self-Alignment of LLMs via Bootstrapping | |
OS-Copilot: Towards Generalist Computer Agents with Self-Improvement | |
ODIN: Disentangled Reward Mitigates Hacking in RLHF | |
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting | |
A Tale of Tails: Model Collapse as a Change of Scaling Laws | |
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models | |
Generative Representational Instruction Tuning | |
ChemLLM: A Chemical Large Language Model | |
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning | |
InternLM-Math: Open Math Large Language Models Toward Verifiable Reasoning | |
DeAL: Decoding-time Alignment for Large Language Models | |
Badllama 3: removing safety finetuning from Llama 3 in minutes | |
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling | |
SubGen: Token Generation in Sublinear Time and Memory | |
Keyframer: Empowering Animation Design using Large Language Models | |
Large Language Model for Table Processing: A Survey | |
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls | |
Approaching Human-Level Forecasting with Language Models | |
A phase transition between positional and semantic learning in a solvable model of dot-product attention | |
Large Language Models as an Indirect Reasoner: Contrapositive and Contradiction for Automated Reasoning | |
LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks | |
Large Language Model based Multi-Agents: A Survey of Progress and Challenges | |
Premise Order Matters in Reasoning with Large Language Models | |
Language Quantized AutoEncoders: Towards Unsupervised Text-Image Alignment | |
Chain-of-Thought Reasoning Without Prompting | |
BitDelta: Your Fine-Tune May Only Be Worth One Bit | |
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset | |
Data Engineering for Scaling Language Models to 128K Context | |
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization | |
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts | |
How to Train Data-Efficient LLMs | |
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects | |
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers | |
GhostWriter: Augmenting Collaborative Human-AI Writing Experiences Through Personalization and Agency | |
Formal-LLM: Integrating Formal Language and Natural Language for Controllable LLM-based Agents | |
Arrows of Time for Large Language Models | |
Coercing LLMs to do and reveal (almost) anything | |
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens | |
Speculative Streaming: Fast LLM Inference without Auxiliary Models | |
Ouroboros: Speculative Decoding with Large Model Enhanced Drafting | |
User-LLM: Efficient LLM Contextualization with User Embeddings | |
BBA: Bi-Modal Behavioral Alignment for Reasoning with Large Vision-Language Models | |
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization | |
How Easy is It to Fool Your Multimodal LLMs? An Empirical Analysis on Deceptive Prompts | |
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models | |
Instruction-tuned Language Models are Better Knowledge Learners | |
The FinBen: An Holistic Financial Benchmark for Large Language Models | |
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling | |
The boundary of neural network trainability is fractal | |
Reformatted Alignment | |
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning | |
LongAgent: Scaling Language Models to 128k Context through Multi-Agent Collaboration | |
OneBit: Towards Extremely Low-bit Large Language Models | |
CoLLaVO: Crayon Large Language and Vision mOdel | |
FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models | |
GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinements | |
RLVF: Learning from Verbal Feedback without Overgeneralization | |
In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss | |
Linear Transformers with Learnable Kernel Functions are Better In-Context Models | |
Efficient Guided Generation for Large Language Models | |
SPAR: Personalized Content-Based Recommendation via Long Engagement Attention | |
LLM Comparator: Visual Analytics for Side-by-Side Evaluation of Large Language Models | |
Large Language Models as Zero-shot Dialogue State Tracker through Function Calling | |
DataDreamer: A Tool for Synthetic Data Generation and Reproducible LLM Workflows | |
LAVE: LLM-Powered Agent Assistance and Language Augmentation for Video Editing | |
Generative Language Modeling for Automated Theorem Proving | |
Automated Unit Test Improvement using Large Language Models at Meta | |
LLM Agents can Autonomously Hack Websites | |
Large Language Models: A Survey | |
In-Context Retrieval-Augmented Language Models | |
Consolidating Attention Features for Multi-view Image Editing | |
LLM-ABR: Designing Adaptive Bitrate Algorithms via Large Language Models | |
OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement | |
Scaling Up LLM Reviews for Google Ads Content Moderation | |
Subobject-level Image Tokenization | |
TinyLLaVA: A Framework of Small-scale Large Multimodal Models | |
Copilot Evaluation Harness: Evaluating LLM-Guided Software Programming | |
CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models | |
LexC-Gen: Generating Data for Extremely Low-Resource Languages with Large Language Models and Bilingual Lexicons | |
EvoPrompting: Language Models for Code-Level Neural Architecture Search | |
Goal Driven Discovery of Distributional Differences via Language Descriptions | |
ChatMusician: Understanding and Generating Music Intrinsically with LLM | |
GPTVQ: The Blessing of Dimensionality for LLM Quantization | |
FuseChat: Knowledge Fusion of Chat Models | |
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs | |
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning | |
API-BLEND: A Comprehensive Corpora for Training and Benchmarking API LLMs | |
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition | |
Large Language Models for Data Annotation: A Survey | |
LoRA+: Efficient Low Rank Adaptation of Large Models | |
When is Tree Search Useful for LLM Planning? It Depends on the Discriminator | |
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits | |
Towards Optimal Learning of Language Models | |
Evaluating Very Long-Term Conversational Memory of LLM Agents | |
Training-Free Long-Context Scaling of Large Language Models | |
Disentangled 3D Scene Generation with Layout Learning | |
Do Large Language Models Latently Perform Multi-Hop Reasoning? | |
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts | |
Nemotron-4 15B Technical Report | |
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding | |
StructLM: Towards Building Generalist Models for Structured Knowledge Grounding | |
Towards Open-ended Visual Quality Comparison | |
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method | |
MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT | |
Orca-Math: Unlocking the potential of SLMs in Grade School Math | |
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers | |
MOSAIC: A Modular System for Assistive and Interactive Cooking | |
Priority Sampling of Large Language Models for Compilers | |
Simple linear attention language models balance the recall-throughput tradeoff | |
API Is Enough: Conformal Prediction for Large Language Models Without Logit-Access | |
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models | |
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models | |
StarCoder 2 and The Stack v2: The Next Generation | |
Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models | |
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs | |
Simulacra as Conscious Exotica | |
Both Matter: Enhancing the Emotional Intelligence of Large Language Models without Compromising the General Intelligence | |
Enhancing Vision-Language Pre-training with Rich Supervisions | |
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal Datasets | |
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs | |
Finetuned Multimodal Language Models Are High-Quality Image-Text Data Filters | |
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap | |
PlanGPT: Enhancing Urban Planning with Tailored Language Model and Efficient Retrieval | |
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey | |
Artifacts or Abduction: How Do LLMs Answer Multiple-Choice Questions Without the Question? | |
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models | |
Emergent and Predictable Memorization in Large Language Models | |
Design2Code: How Far Are We From Automating Front-End Engineering? | |
Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models | |
MathScale: Scaling Instruction Tuning for Mathematical Reasoning | |
Empowering Large Language Model Agents through Action Learning | |
Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use | |
RT-H: Action Hierarchies Using Language | |
DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models | |
Resonance RoPE: Improving Context Length Generalization of Large Language Models | |
Datasets for Large Language Models: A Comprehensive Survey | |
INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models | |
Do Efficient Transformers Really Save Computation? | |
MathPrompter: Mathematical Reasoning using Large Language Models | |
A Comprehensive Survey of AI-Generated Content (AIGC): A History of Generative AI from GAN to ChatGPT | |
Can Large Language Models Reason and Plan? | |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | |
LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error | |
Common 7B Language Models Already Possess Strong Math Capabilities | |
Yi: Open Foundation Models by 01.AI | |
Teaching Large Language Models to Reason with Reinforcement Learning | |
SaulLM-7B: A pioneering Large Language Model for Law | |
Online Adaptation of Language Models with a Memory of Amortized Contexts | |
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference | |
Learning to Decode Collaboratively with Multiple Language Models | |
ShortGPT: Layers in Large Language Models are More Redundant Than You Expect | |
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection | |
The Unreasonable Effectiveness of Eccentric Automatic Prompts | |
A Survey on Evaluation of Large Language Models | |
The pitfalls of next-token prediction | |
Stealing Part of a Production Language Model | |
Algorithmic progress in language models | |
Thinking Tokens for Language Modeling | |
Is Cosine-Similarity of Embeddings Really About Similarity? | |
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment | |
Can't Remember Details in Long Documents? You Need Some R&R | |
KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents | |
Retrieval-Augmented Generation for AI-Generated Content: A Survey | |
LLM Task Interference: An Initial Study on the Impact of Task-Switch in Conversational History | |
3D-VLA: A 3D Vision-Language-Action Generative World Model | |
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking | |
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | |
GPT on a Quantum Computer | |
VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding | |
GiT: Towards Generalist Vision Transformer through Universal Language Interface | |
BurstAttention: An Efficient Distributed Attention Framework for Extremely Long Sequences | |
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring | |
Social Skill Training with Large Language Models | |
StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control | |
Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset | |
Veagle: Advancements in Multimodal Representation Learning | |
Simple and Scalable Strategies to Continually Pre-train Large Language Models | |
SOTOPIA-$π$: Interactive Learning of Socially Intelligent Language Agents | |
Language models scale reliably with over-training and on downstream tasks | |
Gemma: Open Models Based on Gemini Research and Technology | |
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code | |
On the Societal Impact of Open Foundation Models | |
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | |
Chronos: Learning the Language of Time Series | |
Synth$^2$: Boosting Visual-Language Models with Synthetic Captions and Image Embeddings | |
ORPO: Monolithic Preference Optimization without Reference Model | |
MoAI: Mixture of All Intelligence for Large Language and Vision Models | |
An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models | |
Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU | |
DeepSeek-VL: Towards Real-World Vision-Language Understanding | |
How Far Are We from Intelligent Visual Deductive Reasoning? | |
Small Models are Valuable Plug-ins for Large Language Models | |
Backtracing: Retrieving the Cause of the Query | |
MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies | |
VisionLLaMA: A Unified LLaMA Interface for Vision Tasks | |
Learning to Generate Better Than Your LLM | |
Meta-in-context learning in large language models | |
LERF: Language Embedded Radiance Fields | |
Eliciting Latent Predictions from Transformers with the Tuned Lens | |
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU | |
Resurrecting Recurrent Neural Networks for Long Sequences | |
An Overview on Language Models: Recent Developments and Outlook | |
A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library | |
A Survey of Evaluation Metrics Used for NLG Systems | |
SummEval: Re-evaluating Summarization Evaluation | |
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning | |
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency | |
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences | |
LLMR: Real-time Prompting of Interactive Worlds using Large Language Models | |
Logits of API-Protected LLMs Leak Proprietary Information | |
Knowledge Conflicts for LLMs: A Survey | |
Revolutionizing Mental Health Care through LangChain: A Journey with a Large Language Model | |
Will GPT-4 Run DOOM? | |
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation | |
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models | |
Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization | |
Large language models surpass human experts in predicting neuroscience results | |
Reliable, Adaptable, and Attributable Language Models with Retrieval | |
You Need to Pay Better Attention | |
RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval | |
Stable LM 2 1.6B Technical Report | |
DropBP: Accelerating Fine-Tuning of Large Language Models by Dropping Backward Propagation | |
A Survey on Data Selection for Language Models | |
PRP: Propagating Universal Perturbations to Attack Large Language Model Guard-Rails | |
Repetition Improves Language Model Embeddings | |
How Transformers Learn Causal Structure with Gradient Descent | |
Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models | |
Analysing The Impact of Sequence Composition on Language Model Pre-Training | |
Beyond Probabilities: Unveiling the Misalignment in Evaluating Large Language Models | |
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models | |
ProSparse: Introducing and Enhancing Intrinsic Activation Sparsity within Large Language Models | |
Bayesian Reward Models for LLM Alignment | |
KMMLU: Measuring Massive Multitask Language Understanding in Korean | |
Dissecting Human and LLM Preferences | |
Exploring Value Biases: How LLMs Deviate Towards the Ideal | |
Do Llamas Work in English? On the Latent Language of Multilingual Transformers | |
RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models | |
Why are Sensitive Functions Hard for Transformers? | |
Agents Need Not Know Their Purpose | |
Copyright Traps for Large Language Models | |
DoRA: Weight-Decomposed Low-Rank Adaptation | |
SLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer Blocks | |
Rethinking Machine Unlearning for Large Language Models | |
Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast | |
Improving Black-box Robustness with In-Context Rewriting | |
Secret Collusion Among Generative AI Agents | |
Natural Language Reinforcement Learning | |
Universal Neural Functionals | |
QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks | |
LESS: Selecting Influential Data for Targeted Instruction Tuning | |
Building Your Own Product Copilot: Challenges, Opportunities, and Needs | |
ReLU$^2$ Wins: Discovering Efficient Activation Functions for Sparse LLMs | |
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache | |
Continual Learning for Large Language Models: A Survey | |
Towards Efficient and Exact Optimization of Language Model Alignment | |
HyperZ$\cdot$Z$\cdot$W Operator Connects Slow-Fast Networks for Full Context Interaction | |
OMPGPT: A Generative Pre-trained Transformer Model for OpenMP | |
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness | |
APAR: LLMs Can Do Auto-Parallel Auto-Regressive Decoding | |
Spike No More: Stabilizing the Pre-training of Large Language Models | |
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems | |
Are Neighbors Enough? Multi-Head Neural n-gram can be Alternative to Self-attention | |
Zoology: Measuring and Improving Recall in Efficient Language Models | |
GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer | |
Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch | |
LoBaSS: Gauging Learnability in Supervised Fine-tuning Data | |
Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective | |
Instruction Tuning with Human Curriculum | |
MatFormer: Nested Transformer for Elastic Inference | |
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning | |
xVal: A Continuous Number Encoding for Large Language Models | |
Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models | |
Human Feedback is not Gold Standard | |
DeWave: Discrete EEG Waves Encoding for Brain Dynamics to Text Translation | |
Headless Language Models: Learning without Predicting with Contrastive Weight Tying | |
HAE-RAE Bench: Evaluation of Korean Knowledge in Language Models | |
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning | |
Do language models plan ahead for future tokens? | |
CAME: Confidence-guided Adaptive Memory Efficient Optimization | |
Improving Language Plasticity via Pretraining with Active Forgetting | |
AdANNS: A Framework for Adaptive Semantic Search | |
Strategic Reasoning with Language Models | |
MixCE: Training Autoregressive Language Models by Mixing Forward and Reverse Cross-Entropies | |
Sparse is Enough in Scaling Transformers | |
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback | |
A Theory on Adam Instability in Large-Scale Machine Learning | |
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning | |
Are Language Models Worse than Humans at Following Prompts? It's Complicated | |
PropSegmEnt: A Large-Scale Corpus for Proposition-Level Segmentation and Entailment Recognition | |
Transformer Language Models without Positional Encodings Still Learn Positional Information | |
Sequence Parallelism: Long Sequence Training from System Perspective | |
Bio-inspired Structure Identification in Language Embeddings | |
Transformers without Tears: Improving the Normalization of Self-Attention | |
Neural Text Generation with Unlikelihood Training | |
MASS: Masked Sequence to Sequence Pre-training for Language Generation | |
QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs | |
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding | |
Agent-FLAN: Designing Data and Methods of Effective Agent Tuning for Large Language Models | |
Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs | |
TnT-LLM: Text Mining at Scale with Large Language Models | |
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | |
Larimar: Large Language Models with Episodic Memory Control | |
LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images | |
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding | |
MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data | |
PERL: Parameter Efficient Reinforcement Learning from Human Feedback | |
Isotropic3D: Image-to-3D Generation Based on a Single CLIP Embedding | |
Uni-SMART: Universal Science Multimodal Analysis and Research Transformer | |
RAFT: Adapting Language Model to Domain Specific RAG | |
Recurrent Drafter for Fast Speculative Decoding in Large Language Models | |
Alignment Studio: Aligning Large Language Models to Particular Contextual Regulations | |
Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models | |
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews | |
Language Agents as Optimizable Graphs | |
Comparative Study of Large Language Model Architectures on Frontier | |
Optimizing Distributed Training on Frontier for Large Language Models | |
Striped Attention: Faster Ring Attention for Causal Transformers | |
Block-Recurrent Transformers | |
Addressing Some Limitations of Transformers with Feedback Memory | |
Reverse Training to Nurse the Reversal Curse | |
Evaluating Frontier Models for Dangerous Capabilities | |
SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model | |
When Do We Not Need Larger Vision Models? | |
LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression | |
Towards 3D Molecule-Text Interpretation in Language Models | |
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis | |
Mixture of Soft Prompts for Controllable Data Generation | |
HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models | |
Evolutionary Optimization of Model Merging Recipes | |
Semiparametric Token-Sequence Co-Supervision | |
MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries | |
On Learning to Summarize with Large Language Models as References | |
Scalable Prompt Generation for Semi-supervised Learning with Language Models | |
From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models | |
MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? | |
MyVLM: Personalizing VLMs for User-Specific Queries | |
Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference | |
Recourse for reclamation: Chatting with generative language models | |
On the Conversational Persuasiveness of Large Language Models: A Randomized Controlled Trial | |
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging | |
The MiniPile Challenge for Data-Efficient Language Models | |
OmniNet: Omnidirectional Representations from Transformers | |
Arcee's MergeKit: A Toolkit for Merging Large Language Models | |
FinLlama: Financial Sentiment Classification for Algorithmic Trading Applications | |
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces | |
The Case for Co-Designing Model Architectures with Hardware | |
The Unreasonable Ineffectiveness of the Deeper Layers | |
Improving Text-to-Image Consistency via Automatic Prompt Optimization | |
InternLM2 Technical Report | |
AIOS: LLM Agent Operating System | |
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression | |
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | |
Can large language models explore in-context? | |
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series | |
FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions | |
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text | |
AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models | |
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement | |
VidLA: Video-Language Alignment at Scale | |
Compiler generated feedback for Large Language Models | |
sDPO: Don't Use Your Data All at Once | |
Polaris: A Safety-focused LLM Constellation Architecture for Healthcare | |
RankPrompt: Step-by-Step Comparisons Make Language Models Better Reasoners | |
LLM4Decompile: Decompiling Binary Code with Large Language Models | |
Getting the most out of your tokenizer for pre-training and domain adaptation | |
How do different tokenizers perform on downstream tasks in scriptio continua languages?: A case study in Japanese | |
Wider and Deeper LLM Networks are Fairer LLM Evaluators | |
Editing Large Language Models: Problems, Methods, and Opportunities | |
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | |
Long-form factuality in large language models | |
Towards a World-English Language Model for On-Device Virtual Assistants | |
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning | |
MANTa: Efficient Gradient-Based Tokenization for Robust End-to-End Language Modeling | |
STaR-GATE: Teaching Language Models to Ask Clarifying Questions | |
Trusting Your Evidence: Hallucinate Less with Context-aware Decoding | |
LITA: Language Instructed Temporal-Localization Assistant | |
TextCraftor: Your Text Encoder Can be Image Quality Controller | |
Mechanistic Design and Scaling of Hybrid Architectures | |
Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines | |
SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore | |
Blockwise Parallel Transformer for Large Context Models | |
Large Language Models Can Be Strong Differentially Private Learners | |
Head-wise Shareable Attention for Large Language Models | |
Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models | |
ReALM: Reference Resolution As Language Modeling | |
Gecko: Versatile Text Embeddings Distilled from Large Language Models | |
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs | |
Semantically-Shifted Incremental Adapter-Tuning is A Continual ViTransformer | |
Enhancing the General Agent Capabilities of Low-Parameter LLMs through Tuning and Multi-Branch Reasoning | |
DiJiang: Efficient Large Language Models through Compact Kernelization | |
Jamba: A Hybrid Transformer-Mamba Language Model | |
Localizing Paragraph Memorization in Language Models | |
The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction | |
Group Preference Optimization: Few-Shot Alignment of Large Language Models | |
Communicative Agents for Software Development | |
Preference Ranking Optimization for Human Alignment | |
The CRINGE Loss: Learning what language not to model | |
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action | |
Attribute First, then Generate: Locally-attributable Grounded Text Generation | |
Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models | |
FABLES: Evaluating faithfulness and content selection in book-length summarization | |
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward | |
WavLLM: Towards Robust and Adaptive Speech Large Language Model | |
MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text | |
ST-LLM: Large Language Models Are Effective Temporal Learners | |
Advancing LLM Reasoning Generalists with Preference Trees | |
Best Practices and Lessons Learned on Synthetic Data for Language Models | |
Long-context LLMs Struggle with Long In-context Learning | |
HyperCLOVA X Technical Report | |
Poro 34B and the Blessing of Multilinguality | |
Octopus v2: On-device language model for super agent | |
Are large language models superhuman chemists? | |
LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model | |
A comparison of Human, GPT-3.5, and GPT-4 Performance in a University-Level Coding Course | |
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline | |
Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models | |
Mixture-of-Depths: Dynamically allocating compute in transformer-based language models | |
Cross-Architecture Transfer Learning for Linear-Cost Inference Transformers | |
Auxiliary task demands mask the capabilities of smaller language models | |
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks | |
Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity | |
Can LLMs Separate Instructions From Data? And What Do We Even Mean By That? | |
Data Interpreter: An LLM Agent For Data Science | |
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent | |
Training LLMs over Neurally Compressed Text | |
Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models | |
ReFT: Representation Finetuning for Language Models | |
CodeEditorBench: Evaluating Code Editing Capability of Large Language Models | |
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens | |
Red Teaming GPT-4V: Are GPT-4V Safe Against Uni/Multi-Modal Jailbreak Attacks? | |
RALL-E: Robust Codec Language Modeling with Chain-of-Thought Prompting for Text-to-Speech Synthesis | |
LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models | |
Noise-Aware Training of Layout-Aware Language Models | |
AI and the Problem of Knowledge Collapse | |
Learning to Plan and Generate Text with Citations | |
The Probabilities Also Matter: A More Faithful Metric for Faithfulness of Free-Text Explanations in Large Language Models | |
An Incomplete Loop: Deductive, Inductive, and Abductive Learning in Large Language Models | |
ALOHa: A New Measure for Hallucination in Captioning Models | |
Efficient Multi-Vector Dense Retrieval Using Bit Vectors | |
Prompts As Programs: A Structure-Aware Approach to Efficient Compile-Time Prompt Optimization | |
Iterative Forward Tuning Boosts In-context Learning in Language Models | |
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model | |
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance | |
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues | |
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences | |
Stream of Search (SoS): Learning to Search in Language | |
Large Product Key Memory for Pretrained Language Models | |
Large Memory Layers with Product Keys | |
BRAVE: Broadening the visual encoding of vision-language models | |
Adapting LLaMA Decoder to Vision Transformer | |
RULER: What's the Real Context Size of Your Long-Context Language Models? | |
Elephants Never Forget: Memorization and Learning of Tabular Data in Large Language Models | |
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD | |
Reconstructing Hand-Held Objects in 3D | |
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies | |
MuPT: A Generative Symbolic Music Pretrained Transformer | |
OmniFusion Technical Report | |
LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders | |
CodecLM: Aligning Language Models with Tailored Synthetic Data | |
SambaLingo: Teaching Large Language Models New Languages | |
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | |
Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs | |
MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation | |
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models | |
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws | |
Koala: Key frame-conditioned long video-LLM | |
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models | |
Understanding Emergent Abilities of Language Models from the Loss Perspective | |
Enhancing Formal Theorem Proving: A Comprehensive Dataset for Training AI Models on Coq Code | |
Making Large Language Models Better Data Creators | |
On Surgical Fine-tuning for Language Encoders | |
AdaLomo: Low-memory Optimization with Adaptive Learning Rate | |
FinGPT: Instruction Tuning Benchmark for Open-Source Large Language Models in Financial Datasets | |
Embedding Democratic Values into Social Media AIs via Societal Objective Functions | |
Large Language Models as Commonsense Knowledge for Large-Scale Task Planning | |
Less is More: Selective Layer Finetuning with SubTuning | |
Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning | |
AdaVAE: Exploring Adaptive GPT-2s in Variational Auto-Encoders for Language Modeling | |
Cut the CARP: Fishing for zero-shot story evaluation | |
LLoCO: Learning Long Contexts Offline | |
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | |
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments | |
Rho-1: Not All Tokens Are What You Need | |
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models | |
Audio Dialogues: Dialogues dataset for audio and music understanding | |
From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples | |
JetMoE: Reaching Llama2 Performance with 0.1M Dollars | |
Tackling Polysemanticity with Neuron Embeddings | |
WILBUR: Adaptive In-Context Learning for Robust and Accurate Web Agents | |
Entity-Level Sentiment Analysis (ELSA): An exploratory task survey | |
ResearchAgent: Iterative Research Idea Generation over Scientific Literature with Large Language Models | |
Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs | |
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators | |
Mechanics of Next Token Prediction with Self-Attention | |
Scaling Laws of RoPE-based Extrapolation | |
Pre-training Small Base LMs with Fewer Tokens | |
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies | |
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck | |
THOUGHTSCULPT: Reasoning with Intermediate Revision and Search | |
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers | |
Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data | |
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought | |
Toward a Theory of Tokenization in LLMs | |
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models | |
Efficient and Effective Text Encoding for Chinese LLaMA and Alpaca | |
Learn Your Reference Model for Real Good Alignment | |
Large Language Models are as persuasive as humans, but why? About the cognitive effort and moral-emotional language of LLM arguments | |
TextHawk: Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models | |
TransformerFAM: Feedback attention is working memory | |
On Speculative Decoding for Multimodal Large Language Models | |
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length | |
Generative Disco: Text-to-Video Generation for Music Visualization | |
Self-playing Adversarial Language Game Enhances LLM Reasoning | |
Compression Represents Intelligence Linearly | |
The Illusion of State in State-Space Models | |
ChatGPT Can Predict the Future when it Tells Stories Set in the Future About the Past | |
A Thorough Examination of Decoding Methods in the Era of LLMs | |
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA | |
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? | |
Should You Mask 15% in Masked Language Modeling? | |
Finetuning Pretrained Transformers into RNNs | |
BLINK: Multimodal Large Language Models Can See but Not Perceive | |
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models | |
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment | |
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing | |
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of Instruction Data | |
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding | |
MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation | |
When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes | |
Fewer Truncations Improve Language Modeling | |
Glitch Tokens in Large Language Models: Categorization Taxonomy and Effective Detection | |
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity | |
Many-Shot In-Context Learning | |
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning | |
Exploring the landscape of large language models: Foundations, techniques, and challenges | |
Automated Social Science: Language Models as Scientist and Subjects | |
Language Models Still Struggle to Zero-shot Reason about Time Series | |
Stepwise Alignment for Constrained Language Model Policy Optimization | |
MiniCheck: Efficient Fact-Checking of LLMs on Grounding Documents | |
Language Imbalance Can Boost Cross-lingual Generalisation | |
Fine Tuning vs. Retrieval Augmented Generation for Less Popular Knowledge | |
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models | |
LLM-R2: A Large Language Model Enhanced Rule-based Rewrite System for Boosting Query Efficiency | |
TextSquare: Scaling up Text-Centric Visual Instruction Tuning | |
Large Language Models are Few-Shot Health Learners | |
How Far Can We Go with Practical Function-Level Program Repair? | |
AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation | |
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey | |
A Survey on Retrieval-Augmented Text Generation for Large Language Models | |
A RAG Method for Source Code Inquiry Tailored to Long-Context LLMs | |
How faithful are RAG models? Quantifying the tug-of-war between RAG and LLMs' internal prior | |
State Space Model for New-Generation Network Alternative to Transformers: A Survey | |
LLM In-Context Recall is Prompt Dependent | |
Reducing hallucination in structured outputs via Retrieval-Augmented Generation | |
Towards Large Language Models as Copilots for Theorem Proving in Lean | |
Characterizing LLM Abstention Behavior in Science QA with Context Perturbations | |
From $r$ to $Q^*$: Your Language Model is Secretly a Q-Function | |
Who Validates the Validators? Aligning LLM-Assisted Evaluation of LLM Outputs with Human Preferences | |
Aligning language models with human preferences | |
Lossless Acceleration of Large Language Model via Adaptive N-gram Parallel Decoding | |
Superposition Prompting: Improving and Accelerating Retrieval-Augmented Generation | |
Not All Contexts Are Equal: Teaching LLMs Credibility-aware Generation | |
RAR-b: Reasoning as Retrieval Benchmark | |
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models | |
Deep Reinforcement Learning with a Natural Language Action Space | |
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone | |
How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study | |
The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions | |
FlowMind: Automatic Workflow Generation with LLMs | |
XC-Cache: Cross-Attending to Cached Context for Efficient LLM Inference | |
DataComp: In search of the next generation of multimodal datasets | |
Stable and low-precision training for large-scale vision-language models | |
Multi-Head Mixture-of-Experts | |
Transformers Can Represent $n$-gram Language Models | |
Pegasus-v1 Technical Report | |
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs | |
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework | |
SnapKV: LLM Knows What You are Looking for Before Generation | |
SpaceByte: Towards Deleting Tokenization from Large Language Modeling | |
A Survey on Self-Evolution of Large Language Models | |
Retrieval Head Mechanistically Explains Long-Context Factuality | |
Self-Supervised Alignment with Mutual Information: Learning to Follow Principles without Preference Labels | |
SPLATE: Sparse Late Interaction Retrieval | |
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models | |
AgentKit: Flow Engineering with Graphs, not Coding | |
Rethinking LLM Memorization through the Lens of Adversarial Compression | |
What's the Magic Word? A Control Theory of LLM Prompting | |
Adapting Language Models to Compress Contexts | |
Investigating the Role of Feed-Forward Networks in Transformers Using Parallel Attention and Feed-Forward Net Design | |
LMentry: A Language Model Benchmark of Elementary Language Tasks | |
ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning | |
Achieving >97% on GSM8K: Deeply Understanding the Problems Makes LLMs Perfect Reasoners | |
Graph Machine Learning in the Era of Large Language Models (LLMs) | |
NExT: Teaching Large Language Models to Reason about Code Execution | |
"If the Machine Is As Good As Me, Then What Use Am I?" -- How the Use of ChatGPT Changes Young Professionals' Perception of Productivity and Accomplishment | |
Can Language Models Solve Olympiad Programming? | |
Constructing Benchmarks and Interventions for Combating Hallucinations in LLMs | |
CATS: Contextually-Aware Thresholding for Sparsity in Large Language Models | |
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites | |
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages | |
Make Your LLM Fully Utilize the Context | |
Weak-to-Strong Extrapolation Expedites Alignment | |
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension | |
Continual Learning of Large Language Models: A Comprehensive Survey | |
Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding | |
Tele-FLM Technical Report | |
TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning | |
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs | |
Let's Think Dot by Dot: Hidden Computation in Transformer Language Models | |
MoDE: CLIP Data Experts via Clustering | |
Universal Adversarial Triggers Are Not Universal | |
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models | |
Improving Dictionary Learning with Gated Sparse Autoencoders | |
BASS: Batched Attention-optimized Speculative Sampling | |
CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models | |
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data | |
Image Segmentation Using Text and Image Prompts | |
Holistic Safety and Responsibility Evaluations of Advanced AI Models | |
WangLab at MEDIQA-CORR 2024: Optimized LLM-based Programs for Medical Error Detection and Correction | |
NORMAD: A Benchmark for Measuring the Cultural Adaptability of Large Language Models | |
Low-Cost Language Models: Survey and Performance Evaluation on Python Code Generation | |
Efficient Continual Pre-training for Building Domain Specific Large Language Models | |
DeLighT: Deep and Light-weight Transformer | |
Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically | |
GeckOpt: LLM System Efficiency via Intent-Based Tool Selection | |
Better Synthetic Data by Retrieving and Transforming Existing Datasets | |
Relational Graph Convolutional Networks for Sentiment Analysis | |
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data | |
Foundational Challenges in Assuring Alignment and Safety of Large Language Models | |
Nyonic Technical Report | |
LLM Evaluators Recognize and Favor Their Own Generations | |
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning | |
A Survey of Generative Search and Recommendation in the Era of Large Language Models | |
AdvPrompter: Fast Adaptive Adversarial Prompting for LLMs | |
A Primer on the Inner Workings of Transformer-based Language Models | |
U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF | |
zkLLM: Zero Knowledge Proofs for Large Language Models | |
A Survey on the Memory Mechanism of Large Language Model based Agents | |
Large Language Model Agent as a Mechanical Designer | |
Talking Nonsense: Probing Large Language Models' Understanding of Adversarial Gibberish Inputs | |
Near to Mid-term Risks and Opportunities of Open Source Generative AI | |
Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks | |
Benchmarking Mobile Device Control Agents across Diverse Configurations | |
Evaluating Large Language Models on Time Series Feature Understanding: A Comprehensive Taxonomy and Benchmark | |
Assessing The Potential Of Mid-Sized Language Models For Clinical QA | |
Conformal Prediction for Natural Language Processing: A Survey | |
Dual Modalities of Text: Visual and Textual Generative Pre-training | |
AMOR: A Recipe for Building Adaptable Modular Knowledge Agents Through Process Feedback | |
Predicting Emergent Abilities with Infinite Resolution Evaluation | |
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding | |
Hallucination of Multimodal Large Language Models: A Survey | |
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting | |
Benchmarking Benchmark Leakage in Large Language Models | |
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations | |
ChuXin: 1.6B Technical Report | |
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models | |
PromptReps: Prompting Large Language Models to Generate Dense and Sparse Representations for Zero-Shot Document Retrieval | |
LEGENT: Open Platform for Embodied Agents | |
From Persona to Personalization: A Survey on Role-Playing Language Agents | |
CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments | |
BlenderAlchemy: Editing 3D Graphics with Vision-Language Models | |
Autonomous LLM-driven research from data to human-verifiable research papers | |
Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo | |
Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare | |
Semantic Routing for Enhanced Performance of LLM-Assisted Intent-Based 5G Core Network Management and Orchestration | |
RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation | |
Beyond Words: A Mathematical Framework for Interpreting Large Language Models | |
BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers | |
Ranked List Truncation for Large Language Model-based Re-Ranking | |
Building a Large Japanese Web Corpus for Large Language Models | |
STaRK: Benchmarking LLM Retrieval on Textual and Relational Knowledge Bases | |
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey | |
DOCCI: Descriptions of Connected and Contrasting Images | |
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | |
ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training | |
Better & Faster Large Language Models via Multi-token Prediction | |
When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively | |
Extending Llama-3's Context Ten-Fold Overnight | |
Octopus v4: Graph of language models | |
Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics | |
ChatGPTest: opportunities and cautionary tales of utilizing AI for questionnaire pretesting | |
How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library | |
Faster Convergence for Transformer Fine-tuning with Line Search Methods | |
Linear Transformers Are Secretly Fast Weight Programmers | |
FLAME: Factuality-Aware Alignment for Large Language Models | |
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | |
In-Context Learning Creates Task Vectors | |
WildChat: 1M ChatGPT Interaction Logs in the Wild | |
"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval" | |
LLM-AD: Large Language Model based Audio Description System | |
PLAID SHIRTTT for Large-Scale Streaming Dense Retrieval | |
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report | |
Self-Play Preference Optimization for Language Model Alignment | |
Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 | |
A Careful Examination of Large Language Model Performance on Grade School Arithmetic | |
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge | |
Self-Refine Instruction-Tuning for Aligning Reasoning in Language Models | |
Automatic Creative Selection with Cross-Modal Matching | |
Harmonic LLMs are Trustworthy | |
On Training a Neural Network to Explain Binaries | |
In-Context Learning with Long-Context Models: An In-Depth Exploration | |
Transferring Troubles: Cross-Lingual Transferability of Backdoor Attacks in LLMs with Instruction Tuning | |
Aligning LLM Agents by Learning Latent Preference from User Edits | |
How Well Can LLMs Negotiate? NegotiationArena Platform and Analysis | |
Neural Networks Learn Statistics of Increasing Complexity | |
Emerging Properties in Self-Supervised Vision Transformers | |
Advancing Multimodal Medical Capabilities of Gemini | |
"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust | |
D2PO: Discriminator-Guided DPO with Response Evaluation Models | |
Controllable Text Generation in the Instruction-Tuning Era | |
MANTIS: Interleaved Multi-Image Instruction Tuning | |
A Philosophical Introduction to Language Models - Part II: The Way Forward | |
RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing | |
How do Large Language Models Handle Multilingualism? | |
FinBERT: Financial Sentiment Analysis with Pre-trained Language Models | |
Modeling Emotions and Ethics with Large Language Models | |
Structured Chemistry Reasoning with Large Language Models | |
Can Sensitive Information Be Deleted From LLMs? Objectives for Defending Against Extraction Attacks | |
To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO | |
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference | |
MaxMin-RLHF: Towards Equitable Alignment of Large Language Models with Diverse Human Preferences | |
Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving | |
Characterising the Creative Process in Humans and Large Language Models | |
ECC Analyzer: Extract Trading Signal from Earnings Conference Calls using Large Language Model for Stock Performance Prediction | |
Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering | |
AlphaMath Almost Zero: process Supervision without process | |
MAmmoTH2: Scaling Instructions from the Web | |
Is Flash Attention Stable? | |
ImageInWords: Unlocking Hyper-Detailed Image Descriptions | |
What matters when building vision-language models? | |
The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates | |
Understanding LLMs Requires More Than Statistical Generalization | |
Efficient and Economic Large Language Model Inference with Attention Offloading | |
A Survey on Large Language Models for Critical Societal Domains: Finance, Healthcare, and Law | |
Large Language Models are Inconsistent and Biased Evaluators | |
101 Billion Arabic Words Dataset | |
What is Sentiment Meant to Mean to Language Models? | |
GPT-4 passes most of the 297 written Polish Board Certification Examinations | |
Text Quality-Based Pruning for Efficient Training of Language Models | |
Uncovering Deceptive Tendencies in Language Models: A Simulated Company AI Assistant | |
On the Evaluation of Machine-Generated Reports | |
Automatic Programming: Large Language Models and Beyond | |
Long-Term Human Trajectory Prediction using 3D Dynamic Scene Graphs | |
Multi-hop Question Answering over Knowledge Graphs using Large Language Models | |
Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference | |
Parallel Structures in Pre-training Data Yield In-Context Learning | |
BooookScore: A systematic exploration of book-length summarization in the era of LLMs | |
Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs | |
Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders | |
Position Paper: Leveraging Foundational Models for Black-Box Optimization: Benefits, Challenges, and Future Directions | |
Lory: Fully Differentiable Mixture-of-Experts for Autoregressive Language Model Pre-training | |
Beyond Helpfulness and Harmlessness: Eliciting Diverse Behaviors from Large Language Models with Persona In-Context Learning | |
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | |
ReZero is All You Need: Fast Convergence at Large Depth | |
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | |
NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts | |
A Transformer with Stack Attention | |
xLSTM: Extended Long Short-Term Memory | |
Toward In-Context Teaching: Adapting Examples to Students' Misconceptions | |
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | |
The Silicone Ceiling: Auditing GPT's Race and Gender Biases in Hiring | |
Parameter-Efficient Fine-Tuning with Discrete Fourier Transform | |
Granite Code Models: A Family of Open Foundation Models for Code Intelligence | |
FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference | |
Sketch Then Generate: Providing Incremental User Feedback and Guiding LLM Code Generation through Language-Oriented Code Sketches | |
Assemblage: Automatic Binary Dataset Construction for Machine Learning | |
Knowledge Adaptation from Large Language Model to Recommendation for Practical Industrial Application | |
Modeling Caption Diversity in Contrastive Vision-Language Pretraining | |
CLLMs: Consistency Large Language Models | |
You Only Cache Once: Decoder-Decoder Architectures for Language Models | |
VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual Context | |
From LLMs to Actions: Latent Codes as Bridges in Hierarchical Robot Control | |
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals | |
Chain of Thoughtlessness: An Analysis of CoT in Planning | |
LLMs Can Patch Up Missing Relevance Judgments in Evaluation | |
Robust Implementation of Retrieval-Augmented Generation on Edge-based Computing-in-Memory Architectures | |
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention | |
How Susceptible are Large Language Models to Ideological Manipulation? | |
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | |
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers | |
Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? | |
Arctic-Embed: Scalable, Efficient, and Accurate Text Embedding Models | |
Can We Use Large Language Models to Fill Relevance Judgment Holes? | |
Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias | |
Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models | |
Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics | |
The Dark Side of Dataset Scaling: Evaluating Racial Classification in Multimodal Models | |
PoPE: Legendre Orthogonal Polynomials Based Position Encoding for Large Language Models | |
Automating the Enterprise with Foundation Models | |
Enhancing Q-Learning with Large Language Model Heuristics | |
Can Nuanced Language Lead to More Actionable Insights? Exploring the Role of Generative AI in Analytical Narrative Structure | |
Language Modeling Using Tensor Trains | |
PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation | |
Semantic Scaling: Bayesian Ideal Point Estimates with Large Language Models | |
HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis | |
One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations | |
Large Language Models (LLMs) as Agents for Augmented Democracy | |
Scaling Laws for Forgetting When Fine-Tuning Large Language Models | |
GROVE: A Retrieval-augmented Complex Story Generation Framework with A Forest of Evidence | |
Natural Language Processing RELIES on Linguistics | |
Probing Multimodal LLMs as World Models for Driving | |
AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models | |
Deception in Reinforced Autonomous Agents: The Unconventional Rabbit Hat Trick in Legislation | |
A Causal Explainable Guardrails for Large Language Models | |
In-Context Symbolic Regression: Leveraging Language Models for Function Discovery | |
Plan of Thoughts: Heuristic-Guided Problem Solving with Large Language Models | |
Value Augmented Sampling for Language Model Alignment and Personalization | |
Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology | |
A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models | |
Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory | |
Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers | |
Which Nigerian-Pidgin does Generative AI speak?: Issues about Representativeness and Bias for Multilingual and Low Resource Languages | |
Sub-goal Distillation: A Method to Improve Small Language Agents | |
MAmmoTH: Building Math Generalist Models through Hybrid Instruction Tuning | |
Linearizing Large Language Models | |
Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval | |
LMD3: Language Model Data Density Dependence | |
State-Free Inference of State-Space Models: The Transfer Function Approach | |
Generative AI as a metacognitive agent: A comparative mixed-method study with human participants on ICF-mimicking exam performance | |
Masked Structural Growth for 2x Faster Language Model Pre-training | |
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots | |
A Generalist Learner for Multifaceted Medical Image Interpretation | |
The Platonic Representation Hypothesis | |
AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | |
A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking | |
Zero-Shot Tokenizer Transfer | |
RLHF Workflow: From Reward Modeling to Online RLHF | |
LogoMotion: Visually Grounded Code Generation for Content-Aware Animation | |
SUTRA: Scalable Multilingual Language Model Architecture | |
ERAGent: Enhancing Retrieval-Augmented Language Models with Improved Accuracy, Efficiency, and Personalization | |
Large Language Models as Planning Domain Generators | |
Explaining Text Similarity in Transformer Models | |
The Hidden Pitfalls of the Cosine Similarity Loss | |
Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning | |
Can LLMs Deeply Detect Complex Malicious Queries? A Framework for Jailbreaking via Obfuscating Intent | |
Exposing Attention Glitches with Flip-Flop Language Modeling | |
CodeT5+: Open Code Large Language Models for Code Understanding and Generation | |
CinePile: A Long Video Question Answering Dataset and Benchmark | |
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | |
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory | |
Enhancing Gender-Inclusive Machine Translation with Neomorphemes and Large Language Models | |
Understanding the performance gap between online and offline alignment algorithms | |
SpeechGuard: Exploring the Adversarial Robustness of Multimodal Large Language Models | |
SpeechVerse: A Large-scale Generalizable Audio Language Model | |
Compositional Text-to-Image Generation with Dense Blob Representations | |
Benchmarking Retrieval-Augmented Large Language Models in Biomedical NLP: Application, Robustness, and Self-Awareness | |
People cannot distinguish GPT-4 from a human in a Turing test | |
LLM-Augmented Agent-Based Modelling for Social Simulations: Challenges and Opportunities | |
What Can Natural Language Processing Do for Peer Review? | |
Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment | |
Improving Transformers with Dynamically Composable Multi-Head Attention | |
Word2World: Generating Stories and Worlds through Large Language Models | |
Ask Again, Then Fail: Large Language Models' Vacillations in Judgement | |
ALPINE: Unveiling the Planning Capability of Autoregressive Learning in Language Models | |
Xmodel-VLM: A Simple Baseline for Multimodal Vision Language Model | |
Challenges in Deploying Long-Context Transformers: A Theoretical Peak Performance Analysis | |
Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs | |
Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models | |
Special Characters Attack: Toward Scalable Training Data Extraction From Large Language Models | |
Measuring Implicit Bias in Explicitly Unbiased Large Language Models | |
UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models | |
Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning | |
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection | |
Chameleon: Mixed-Modal Early-Fusion Foundation Models | |
Many-Shot In-Context Learning in Multimodal Foundation Models | |
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model | |
LoRA Learns Less and Forgets Less | |
Using ChatGPT for Thematic Analysis | |
Are Large Pre-Trained Language Models Leaking Your Personal Information? | |
Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre | |
HMT: Hierarchical Memory Transformer for Long Context Language Processing | |
Air Gap: Protecting Privacy-Conscious Conversational Agents | |
Elements of World Knowledge (EWOK): A cognition-inspired framework for evaluating basic world knowledge in language models | |
LLM-Assisted Rule Based Machine Translation for Low/No-Resource Languages | |
MarkLLM: An Open-Source Toolkit for LLM Watermarking | |
"They are uncultured": Unveiling Covert Harms and Social Threats in LLM Generated Conversations | |
Towards Uncertainty-Aware Language Agent | |
Observational Scaling Laws and the Predictability of Language Model Performance | |
Layer-Condensed KV Cache for Efficient Inference of Large Language Models | |
Inducing Group Fairness in LLM-Based Decisions | |
CELA: Cost-Efficient Language Model Alignment for CTR Prediction | |
RDRec: Rationale Distillation for LLM-based Recommendation | |
A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers | |
INDUS: Effective and Efficient Language Models for Scientific Applications | |
Dynamic data sampler for cross-language transfer learning in large language models | |
Grounded 3D-LLM with Referent Tokens | |
PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition | |
Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining | |
MEDVOC: Vocabulary Adaptation for Fine-tuning Pre-trained Language Models on Medical Text Summarization | |
WavCraft: Audio Editing and Generation with Large Language Models | |
Large Language Models Fall Short: Understanding Complex Relationships in Detective Narratives | |
Transformers learn to implement preconditioned gradient descent for in-context learning | |
BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting | |
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | |
Imp: Highly Capable Large Multimodal Models for Mobile Devices | |
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts | |
Towards Modular LLMs by Building and Reusing a Library of LoRAs | |
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework | |
Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risks | |
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts | |
Latent State Estimation Helps UI Agents to Reason | |
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention | |
Large Language Models Meet NLP: A Survey | |
PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference | |
Blind Baselines Beat Membership Inference Attacks for Foundation Models | |
Your Transformer is Secretly Linear | |
Can AI Relate: Testing Large Language Model Response for Mental Health Support | |
Increasing the LLM Accuracy for Question Answering: Ontologies to the Rescue! | |
Large Language Models are Biased Reinforcement Learners | |
ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios | |
SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation Tasks | |
Keep It Private: Unsupervised Privatization of Online Text | |
Generative AI and Large Language Models for Cyber Security: All Insights You Need | |
Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents | |
Are Large Language Models Moral Hypocrites? A Study Based on Moral Foundations | |
Leveraging Reinforcement Learning and Large Language Models for Code Optimization | |
Data Contamination Quiz: A Tool to Detect and Estimate Contamination in Large Language Models | |
Large Language Models Are Not Robust Multiple Choice Selectors | |
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model | |
Not All Language Model Features Are Linear | |
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data | |
Dense Connector for MLLMs | |
A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns | |
Bitune: Bidirectional Instruction-Tuning | |
Lessons from the Trenches on Reproducible Evaluation of Language Models | |
Multi-turn Reinforcement Learning from Preference Human Feedback | |
Base of RoPE Bounds Context Length | |
Top-Down Partitioning for Efficient List-Wise Ranking | |
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast | |
xRAG: Extreme Context Compression for Retrieval-augmented Generation with One Token | |
Agent Planning with World Knowledge Model | |
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability | |
Distributed Speculative Inference of Large Language Models | |
Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations | |
RAGE Against the Machine: Retrieval-Augmented LLM Explanations | |
Efficient Multimodal Large Language Models: A Survey | |
Natural Language Can Help Bridge the Sim2Real Gap | |
FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research | |
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models | |
On the Brittle Foundations of ReAct Prompting for Agentic Large Language Models | |
Infinite Limits of Multi-head Transformer Dynamics | |
News Recommendation with Category Description by a Large Language Model | |
Evaluation of the Programming Skills of Large Language Models | |
AI-Assisted Assessment of Coding Practices in Modern Code Review | |
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery | |
Super Tiny Language Models | |
RE-Adapt: Reverse Engineered Adaptation of Large Language Models | |
CoachLM: Automatic Instruction Revisions Improve the Data Quality in LLM Instruction Tuning | |
"According to ...": Prompting Language Models Improves Quoting from Pre-Training Data | |
Instruction Tuning With Loss Over Instructions | |
GPT-4 Jailbreaks Itself with Near-Perfect Success Using Self-Explanation | |
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models | |
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach | |
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models | |
Luban: Building Open-Ended Creative Agents via Autonomous Embodied Verification | |
V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM | |
SignLLM: Sign Languages Production Large Language Models | |
Stacking Your Transformers: A Closer Look at Model Growth for Efficient LLM Pre-Training | |
Are Long-LLMs A Necessity For Long-Context Tasks? | |
iVideoGPT: Interactive VideoGPTs are Scalable World Models | |
Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition | |
Extracting Prompts by Inverting LLM Outputs | |
Data movement limits to frontier model training | |
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization | |
Aya 23: Open Weight Releases to Further Multilingual Progress | |
AutoCoder: Enhancing Code Large Language Model with \textsc{AIEV-Instruct} | |
OLAPH: Improving Factuality in Biomedical Long-form Question Answering | |
Tailoring Vaccine Messaging with Common-Ground Opinions | |
Efficient Adversarial Training in LLMs with Continuous Attacks | |
AGRaME: Any-Granularity Ranking with Multi-Vector Embeddings | |
Neural Scaling Laws for Embodied AI | |
Evaluating AI-generated code for C++, Fortran, Go, Java, Julia, Matlab, Python, R, and Rust | |
The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub | |
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models | |
G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation | |
"The Death of Wikipedia?" -- Exploring the Impact of ChatGPT on Wikipedia Engagement | |
Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning | |
Eliciting Latent Knowledge from Quirky Language Models | |
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding | |
Matryoshka Multimodal Models | |
NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models | |
Transformers Can Do Arithmetic with the Right Embeddings | |
$\textit{Trans-LoRA}$: towards data-free Transferable Parameter Efficient Finetuning | |
An Introduction to Vision-Language Modeling | |
Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models | |
Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words? | |
Understanding Linear Probing then Fine-tuning Language Models from NTK Perspective | |
Zamba: A Compact 7B SSM Hybrid Model | |
A Survey on LLM Inference-Time Self-Improvement | |
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters | |
MoEUT: Mixture-of-Experts Universal Transformers | |
DAGER: Exact Gradient Inversion for Large Language Models | |
HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models | |
The Impact of Positional Encoding on Length Generalization in Transformers | |
BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks | |
Phase Transitions in the Output Distribution of Large Language Models | |
Crafting Interpretable Embeddings by Asking LLMs Questions | |
gzip Predicts Data-dependent Scaling Laws | |
Spectral Editing of Activations for Large Language Model Alignment | |
Online Merging Optimizers for Boosting Rewards and Mitigating Tax in Alignment | |
Learning to Reason via Program Generation, Emulation, and Search | |
Hacc-Man: An Arcade Game for Jailbreaking LLMs | |
CLARINET: Augmenting Language Models to Ask Clarification Questions for Retrieval | |
FinTextQA: A Dataset for Long-form Financial Question Answering | |
On the Self-Verification Limitations of Large Language Models on Reasoning and Planning Tasks | |
Don't Forget to Connect! Improving RAG with Graph-based Reranking | |
Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass | |
LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models | |
Faithful Logical Reasoning via Symbolic Chain-of-Thought | |
2BP: 2-Stage Backpropagation | |
Accelerating Transformer Inference and Training with 2:4 Activation Sparsity | |
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections | |
Fine-tuning Large Language Models with Sequential Instructions | |
Evaluating the Factual Consistency of Large Language Models Through News Summarization | |
Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF | |
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs | |
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment | |
MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series | |
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution | |
Robust Preference Optimization through Reward Model Distillation | |
Jina CLIP: Your CLIP Model Is Also Your Text Retriever | |
Matryoshka Query Transformer for Large Vision-Language Models | |
Are You Sure? Rank Them Again: Repeated Ranking For Better Preference Datasets | |
Offline Regularised Reinforcement Learning for Large Language Models Alignment | |
LLMs achieve adult human performance on higher-order theory of mind tasks | |
On the Role of Attention Masks and LayerNorm in Transformers | |
OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning | |
Language Models Trained to do Arithmetic Predict Human Risky and Intertemporal Choice | |
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization | |
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering | |
Xwin-LM: Strong and Scalable Alignment Practice for LLMs | |
GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning | |
Similarity is Not All You Need: Endowing Retrieval Augmented Generation with Multi Layered Thoughts | |
Enhancing Large Vision Language Models with Self-Training on Image Comprehension | |
Preference Learning Algorithms Do Not Learn Preference Rankings | |
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions | |
Contextual Position Encoding: Learning to Count What's Important | |
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement | |
Linking In-context Learning in Transformers to Human Episodic Memory | |
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models | |
Bayesian Online Natural Gradient (BONG) | |
Data Augmentation Vision Transformer for Fine-grained Image Classification | |
MotionLLM: Understanding Human Behaviors from Human Motions and Videos | |
Don't drop your samples! Coherence-aware training benefits Conditional diffusion | |
Large Language Models Can Self-Improve At Web Agent Tasks | |
Group Robust Preference Optimization in Reward-free RLHF | |
Evaluating Large Language Model Biases in Persona-Steered Generation | |
Would I Lie To You? Inference Time Alignment of Language Models using Direct Preference Heads | |
Parrot: Efficient Serving of LLM-based Applications with Semantic Variable | |
Is In-Context Learning Sufficient for Instruction Following in LLMs? | |
Aligning to Thousands of Preferences via System Message Generalization | |
DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories | |
Generating Query Recommendations via LLMs | |
Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding | |
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning | |
Position: Foundation Agents as the Paradigm Shift for Decision Making | |
PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression | |
A Survey on Vision-Language-Action Models for Embodied AI | |
Large Language Models Can Self-Correct with Minimal Effort | |
Language Models with Conformal Factuality Guarantees | |
Prompt Optimization with Human Feedback | |
GPT is Not an Annotator: The Necessity of Human Annotation in Fairness Benchmark Construction | |
RealitySummary: On-Demand Mixed Reality Document Enhancement using Large Language Models | |
Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars | |
Certifiably Robust RAG against Retrieval Corruption | |
Want To Reduce Labeling Cost? GPT-3 Can Help | |
Embedding-Aligned Language Models | |
Qiskit Code Assistant: Training LLMs for generating Quantum Computing Code | |
A Survey of Multimodal Large Language Model from A Data-centric Perspective | |
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis | |
LACIE: Listener-Aware Finetuning for Confidence Calibration in Large Language Models | |
CHIQ: Contextual History Enhancement for Improving Query Rewriting in Conversational Search | |
SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales | |
Large Language Models are Zero-Shot Next Location Predictors | |
There and Back Again: The AI Alignment Paradox | |
Expanded Gating Ranges Improve Activation Functions | |
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models | |
The Geometry of Categorical and Hierarchical Concepts in Large Language Models | |
Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA | |
SeamlessExpressiveLM: Speech Language Model for Expressive Speech-to-Speech Translation with Chain-of-Thought | |
Grokfast: Accelerated Grokking by Amplifying Slow Gradients | |
Stress-Testing Capability Elicitation With Password-Locked Models | |
Knowledge Circuits in Pretrained Transformers | |
Safe LoRA: the Silver Lining of Reducing Safety Risks when Fine-tuning Large Language Models | |
Learning the Language of Protein Structure | |
Zyda: A 1.3T Dataset for Open Language Modeling | |
SpatialRGPT: Grounded Spatial Reasoning in Vision Language Model | |
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark | |
Towards Scalable Automated Alignment of LLMs: A Survey | |
Pretrained Hybrids with MAD Skills | |
Show, Don't Tell: Aligning Language Models with Demonstrated Feedback | |
BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling | |
Controlling Large Language Model Agents with Entropic Activation Steering | |
A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians | |
Transfer Q Star: Principled Decoding for LLM Alignment | |
Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation | |
From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step | |
ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory | |
To Believe or Not to Believe Your LLM | |
Scalable MatMul-free Language Modeling | |
Meta-Designing Quantum Experiments with Language Models | |
Extended Mind Transformers | |
Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language Models | |
LLMs Beyond English: Scaling the Multilingual Capability of LLMs with Cross-Lingual Feedback | |
How to Understand Whole Software Repository? | |
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs | |
Automated Focused Feedback Generation for Scientific Writing Assistance | |
PosterLLaVa: Constructing a Unified Multi-modal Layout Generator with LLM | |
CLIPLoss and Norm-Based Data Selection Methods for Multimodal Contrastive Learning | |
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters | |
Scaling Laws for Reward Model Overoptimization in Direct Alignment Algorithms | |
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes | |
PLaD: Preference-based Large Language Model Distillation with Pseudo-Preference Pairs | |
Item-Language Model for Conversational Recommendation | |
Block Transformer: Global-to-Local Language Modeling for Fast Inference | |
Parrot: Multilingual Visual Instruction Tuning | |
Multiple Choice Questions and Large Languages Models: A Case Study with Fictional Medical Data | |
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities | |
A Study of Optimizations for Fine-tuning Large Language Models | |
Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses | |
The Impossibility of Fair LLMs | |
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models | |
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments | |
Are We Done with MMLU? | |
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search | |
Seq1F1B: Efficient Sequence-Level Pipeline Parallelism for Large Language Model Training | |
QJL: 1-Bit Quantized JL Transform for KV Cache Quantization with Zero Overhead | |
Pre-trained Large Language Models Use Fourier Features to Compute Addition | |
CLMASP: Coupling Large Language Models with Answer Set Programming for Robotic Task Planning | |
PrE-Text: Training Language Models on Private Federated Data in the Age of LLMs | |
Chain of Agents: Large Language Models Collaborating on Long-Context Tasks | |
DiffUHaul: A Training-Free Method for Object Dragging in Images | |
Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model | |
ABodyBuilder3: Improved and scalable antibody structure predictions | |
A Data-Centric Approach To Generate Faithful and High Quality Patient Summaries with Large Language Models | |
DsDm: Model-Aware Dataset Selection with Datamodels | |
Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools | |
Improving Alignment and Robustness with Short Circuiting | |
Semantically Diverse Language Generation for Uncertainty Estimation in Language Models | |
Matching Anything by Segmenting Anything | |
What Do Language Models Learn in Context? The Structured Task Hypothesis | |
Scaling and evaluating sparse autoencoders | |
Verbalized Machine Learning: Revisiting Machine Learning with Language Models | |
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller | |
Iteration Head: A Mechanistic Study of Chain-of-Thought | |
Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention | |
Does your data spark joy? Performance gains from domain upsampling at the end of training | |
WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild | |
CRAG -- Comprehensive RAG Benchmark | |
Mixture-of-Agents Enhances Large Language Model Capabilities | |
Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach | |
MAIRA-2: Grounded Radiology Report Generation | |
Proofread: Fixes All Errors with One Tap | |
NATURAL PLAN: Benchmarking LLMs on Natural Language Planning | |
Large Language Model Confidence Estimation via Black-Box Access | |
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation | |
Towards a Personal Health Large Language Model | |
Tx-LLM: A Large Language Model for Therapeutics | |
ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization | |
Unified Text-to-Image Generation and Retrieval | |
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers | |
BERTs are Generative In-Context Learners | |
Is Free Self-Alignment Possible? | |
TACT: Advancing Complex Aggregative Reasoning with Information Extraction Tools | |
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models | |
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters | |
Creativity Has Left the Chat: The Price of Debiasing Language Models | |
UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor | |
Can Language Models Serve as Text-Based World Simulators? | |
How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad | |
Contrastive learning of T cell receptor representations | |
Self-Tuning: Instructing LLMs to Effectively Acquire New Knowledge through Self-Teaching | |
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned | |
MATES: Model-Aware Data Selection for Efficient Pretraining with Data Influence Models | |
MedREQAL: Examining Medical Knowledge Recall of Large Language Models via Question Answering | |
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models | |
OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models | |
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study | |
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks | |
On the Reliability of Watermarks for Large Language Models | |
A Survey of Diffusion Models in Natural Language Processing | |
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be | |
Learning to Grow Pretrained Models for Efficient Transformer Training | |
An Image is Worth 32 Tokens for Reconstruction and Generation | |
Simple and Effective Masked Diffusion Language Models | |
Instant 3D Human Avatar Generation using Image Diffusion Models | |
TextGrad: Automatic "Differentiation" via Text | |
Spectrum: Targeted Training on Signal to Noise Ratio | |
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | |
Multimodal Belief Prediction | |
McEval: Massively Multilingual Code Evaluation | |
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B | |
Merging Improves Self-Critique Against Jailbreak Attacks | |
Confabulation: The Surprising Value of Large Language Model Hallucinations | |
The Prompt Report: A Systematic Survey of Prompting Techniques | |
Improve Mathematical Reasoning in Language Models by Automated Process Supervision | |
MedFuzz: Exploring the Robustness of Large Language Models in Medical Question Answering | |
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models | |
Parallelizing Linear Transformers with the Delta Rule over Sequence Length | |
LLM Dataset Inference: Did you train on my dataset? | |
Towards Lifelong Learning of Large Language Models: A Survey | |
PowerInfer-2: Fast Large Language Model Inference on a Smartphone | |
LINGOLY: A Benchmark of Olympiad-Level Linguistic Reasoning Puzzles in Low-Resource and Extinct Languages | |
Attention as a Hypernetwork | |
ConStat: Performance-Based Contamination Detection in Large Language Models | |
What If We Recaption Billions of Web Images with LLaMA-3? | |
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing | |
Discovering Preference Optimization Algorithms with and for Large Language Models | |
Large Language Models Must Be Taught to Know What They Don't Know | |
An Empirical Study of Mamba-based Language Models | |
Collective Constitutional AI: Aligning a Language Model with Public Input | |
3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination | |
Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models | |
Explore the Limits of Omni-modal Pretraining at Scale | |
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition | |
Large Language Model Unlearning via Embedding-Corrupted Prompts | |
Grounding Multimodal Large Language Models in Actions | |
BertaQA: How Much Do Language Models Know About Local Culture? | |
VCR: Visual Caption Restoration | |
Hibou: A Family of Foundational Vision Transformers for Pathology | |
Repurposing Language Models into Embedding Models: Finding the Compute-Optimal Recipe | |
Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost | |
Improving Retrieval for RAG based Question Answering Models on Financial Documents | |
On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey | |
MuirBench: A Comprehensive Benchmark for Robust Multi-image Understanding | |
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities | |
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models | |
MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations | |
Optimised Grouped-Query Attention Mechanism for Transformers | |
Transformers meet Neural Algorithmic Reasoners | |
MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding | |
Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback | |
OpenVLA: An Open-Source Vision-Language-Action Model | |
ReMI: A Dataset for Reasoning with Multiple Images | |
Test of Time: A Benchmark for Evaluating LLMs on Temporal Reasoning | |
EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts | |
Investigating the translation capabilities of Large Language Models trained on parallel data only | |
Multi-Agent Software Development through Cross-Team Collaboration | |
BGE M3-Embedding: Multi-Lingual, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation | |
mOSCAR: A Large-scale Multilingual and Multimodal Document-level Corpus | |
UnO: Unsupervised Occupancy Fields for Perception and Forecasting | |
HelpSteer2: Open-source dataset for training top-performing reward models | |
Mistral-C2F: Coarse to Fine Actor for Analytical and Reasoning Enhancement in RLHF and Effective-Merged LLMs | |
Language Model Council: Benchmarking Foundation Models on Highly Subjective Tasks by Consensus | |
CS-Bench: A Comprehensive Benchmark for Large Language Models towards Computer Science Mastery | |
Real2Code: Reconstruct Articulated Objects via Code Generation | |
DafnyBench: A Benchmark for Formal Software Verification | |
Estimating the Hallucination Rate of Generative AI | |
RWKV-CLIP: A Robust Vision-Language Representation Learner | |
CVQA: Culturally-diverse Multilingual Visual Question Answering Benchmark | |
Early Weight Averaging meets High Learning Rates for LLM Pre-training | |
Text Embeddings by Weakly-Supervised Contrastive Pre-training | |
Promptagator: Few-shot Dense Retrieval From 8 Examples | |
RetroMAE: Pre-Training Retrieval-oriented Language Models Via Masked Auto-Encoder | |
InPars: Data Augmentation for Information Retrieval using Large Language Models | |
Reconciling Kaplan and Chinchilla Scaling Laws | |
Improving Efficient Neural Ranking Models with Cross-Architecture Knowledge Distillation | |
Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval | |
SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals | |
Cycles of Thought: Measuring LLM Confidence through Stable Explanations | |
From Tarzan to Tolkien: Controlling the Language Proficiency Level of LLMs for Content Generation | |
Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models | |
Are you still on track!? Catching LLM Task Drift with Activations | |
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement | |
UICoder: Finetuning Large Language Models to Generate User Interface Code through Automated Feedback | |
Quantifying Variance in Evaluation Benchmarks | |
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs | |
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models | |
BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack | |
Evaluation of Large Language Models: STEM education and Gender Stereotypes | |
Exploring the Correlation between Human and Machine Evaluation of Simultaneous Speech Translation | |
Mixture-of-Subspaces in Low-Rank Adaptation | |
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation | |
CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescriptions | |
GEB-1.3B: Open Lightweight Large Language Model | |
Rapport-Driven Virtual Agent: Rapport Building Dialogue Strategy for Improving User Experience at First Meeting | |
A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery | |
Large language model validity via enhanced conformal prediction methods | |
Decoding the Diversity: A Review of the Indic AI Research Landscape | |
Advancing High Resolution Vision-Language Models in Biomedicine | |
Bayesian Statistical Modeling with Predictors from LLMs | |
Self-Supervised Speech Representations are More Phonetic than Semantic | |
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text | |
Needle In A Multimodal Haystack | |
mDPO: Conditional Preference Optimization for Multimodal Large Language Models | |
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% | |
DataComp-LM: In search of the next generation of training sets for language models | |
Transcoders Beat Sparse Autoencoders for Interpretability | |
Set-Based Prompting: Provably Solving the Language Model Order Dependency Problem | |
The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models | |
Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models | |
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning | |
MMDU: A Multi-Turn Multi-Image Dialog Understanding Benchmark and Instruction-Tuning Dataset for LVLMs | |
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models | |
Language Modeling with Editable External Knowledge | |
WPO: Enhancing RLHF with Weighted Preference Optimization | |
VideoLLM-online: Online Video Large Language Model for Streaming Video | |
How Do Large Language Models Acquire Factual Knowledge During Pretraining? | |
Task Me Anything | |
Refusal in Language Models Is Mediated by a Single Direction | |
DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models | |
Evaluating Open Language Models Across Task Types, Application Domains, and Reasoning Types: An In-Depth Experimental Analysis | |
GUICourse: From General Vision Language Models to Versatile GUI Agents | |
MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens | |
In-Context Editing: Learning Knowledge from Self-Induced Distributions | |
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences | |
THEANINE: Revisiting Memory Management in Long-term Conversations with Timeline-augmented Response Generation | |
Breaking the Attention Bottleneck | |
STAR: SocioTechnical Approach to Red Teaming Language Models | |
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents | |
HiddenTables & PyQTax: A Cooperative Game and Dataset For TableQA to Ensure Scale and Data Privacy Across a Myriad of Taxonomies | |
CoLoR-Filter: Conditional Loss Reduction Filtering for Targeted Language Model Pre-training | |
AudioPaLM: A Large Language Model That Can Speak and Listen | |
ChatGLM: A Family of Large Language Models from GLM-130B to GLM-4 All Tools | |
RoboCat: A Self-Improving Generalist Agent for Robotic Manipulation | |
MotionGPT: Finetuned LLMs Are General-Purpose Motion Generators | |
ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation | |
Full Parameter Fine-tuning for Large Language Models with Limited Resources | |
Improving Multi-Agent Debate with Sparse Communication Topology | |
Meta Reasoning for Large Language Models | |
A Simple and Effective $L_2$ Norm-Based Strategy for KV Cache Compression | |
Unifying Multimodal Retrieval via Document Screenshot Embedding | |
Humor in AI: Massive Scale Crowd-Sourced Preferences and Benchmarks for Cartoon Captioning | |
Deep Bayesian Active Learning for Preference Modeling in Large Language Models | |
OLMES: A Standard for Language Model Evaluations | |
Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent "Middle" Enhancement | |
What Are the Odds? Language Models Are Capable of Probabilistic Reasoning | |
From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries | |
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI | |
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning | |
Self-Distillation for Model Stacking Unlocks Cross-Lingual NLU in 200+ Languages | |
Hierarchical Prompting Taxonomy: A Universal Evaluation Framework for Large Language Models | |
News Without Borders: Domain Adaptation of Multilingual Sentence Embeddings for Cross-lingual News Recommendation | |
Open-Source Web Service with Morphological Dictionary-Supplemented Deep Learning for Morphosyntactic Analysis of Czech | |
Mixture of Scales: Memory-Efficient Token-Adaptive Binarization for Large Language Models | |
JEN-1 DreamStyler: Customized Musical Concept Learning via Pivotal Parameters Tuning | |
VoCo-LLaMA: Towards Vision Compression with Large Language Models | |
TroL: Traversal of Layers for Large Language and Vision Models | |
BPO: Supercharging Online Preference Learning by Adhering to the Proximity of Behavior LLM | |
Statistical Uncertainty in Word Embeddings: GloVe-V | |
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks | |
Large Scale Transfer Learning for Tabular Data via Language Modeling | |
Transcoders Find Interpretable LLM Feature Circuits | |
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | |
RepLiQA: A Question-Answering Dataset for Benchmarking LLMs on Unseen Reference Content | |
Tokenization Falling Short: The Curse of Tokenization | |
Can LLM be a Personalized Judge? | |
NAST: Noise Aware Speech Tokenization for Speech Language Models | |
Bootstrapping Language Models with DPO Implicit Rewards | |
The Impact of Initialization on LoRA Finetuning Dynamics | |
StatBot.Swiss: Bilingual Open Data Exploration in Natural Language | |
Adversarial Attacks on Multimodal Agents | |
Estimating Knowledge in Large Language Models Without Generating a Single Token | |
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning | |
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models | |
Prompt Design Matters for Computational Social Science Tasks but in Unpredictable Ways | |
From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline | |
A Survey of Large Language Models for Financial Applications: Progress, Prospects and Challenges | |
Safety Arithmetic: A Framework for Test-time Safety Alignment of Language Models by Steering Parameters and Activations | |
Long Code Arena: a Set of Benchmarks for Long-Context Code Models | |
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization | |
Do Not Design, Learn: A Trainable Scoring Function for Uncertainty Estimation in Generative LLMs | |
MMBench-Video: A Long-Form Multi-Shot Benchmark for Holistic Video Understanding | |
Instruction Pre-Training: Language Models are Supervised Multitask Learners | |
LLMatDesign: Autonomous Materials Discovery with Large Language Models | |
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? | |
AgentReview: Exploring Peer Review Dynamics with LLM Agents | |
$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains | |
Self-MoE: Towards Compositional Large Language Models with Self-Specialized Experts | |
Are LLMs Naturally Good at Synthetic Tabular Data Generation? | |
DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning | |
Measuring memorization in RLHF for code completion | |
Intrinsic Evaluation of Unlearning Using Parametric Knowledge Traces | |
Multimodal Needle in a Haystack: Benchmarking Long-Context Capability of Multimodal Large Language Models | |
Breaking Boundaries: Investigating the Effects of Model Editing on Cross-linguistic Performance | |
garak: A Framework for Security Probing Large Language Models | |
Leading Whitespaces of Language Models' Subword Vocabulary Poses a Confound for Calculating Word Probabilities | |
GenQA: Generating Millions of Instructions from a Handful of Prompts | |
Transferring Knowledge from Large Foundation Models to Small Downstream Models | |
NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security | |
Model Merging and Safety Alignment: One Bad Model Spoils the Bunch | |
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities | |
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs | |
DeciMamba: Exploring the Length Extrapolation Potential of Mamba | |
Evidence of a log scaling law for political persuasion with large language models | |
LiveMind: Low-latency Large Language Models with Simultaneous Inference | |
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning | |
Model Internals-based Answer Attribution for Trustworthy Retrieval-Augmented Generation | |
Improving Visual Commonsense in Language Models via Multiple Image Generation | |
Nicer Than Humans: How do Large Language Models Behave in the Prisoner's Dilemma? | |
Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models | |
PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers | |
Iterative Length-Regularized Direct Preference Optimization: A Case Study on Improving 7B Language Models to GPT-4 Level | |
HARE: HumAn pRiors, a key to small language model Efficiency | |
Delving into ChatGPT usage in academic writing through excess vocabulary | |
A Tale of Trust and Accuracy: Base vs. Instruct LLMs in RAG Systems | |
Interpretability of Language Models via Task Spaces | |
Surface Form Competition: Why the Highest Probability Answer Isn't Always Right | |
Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data | |
CodeRAG-Bench: Can Retrieval Augment Code Generation? | |
A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models | |
Large Language Models are Null-Shot Learners | |
SGLang: Efficient Execution of Structured Language Model Programs | |
LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs | |
Reward Steering with Evolutionary Heuristics for Decoding-time Alignment | |
Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation | |
Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models | |
How Well Do LLMs Represent Values Across Cultures? Empirical Analysis of LLM Responses Based on Hofstede Cultural Dimensions | |
Learning to Retrieve Iteratively for In-Context Learning | |
Jailbreaking as a Reward Misspecification Problem | |
Information Guided Regularization for Fine-tuning Language Models | |
Unlocking the Global Synergies in Low-Rank Adapters | |
Towards Retrieval Augmented Generation over Large Video Libraries | |
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection | |
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework | |
RE-AdaptIR: Improving Information Retrieval through Reverse Engineered Adaptation | |
Exploring Design Choices for Building Language-Specific LLMs | |
ICAL: Continual Learning of Multimodal Agents by Transforming Trajectories into Actionable Insights | |
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold | |
African or European Swallow? Benchmarking Large Vision-Language Models for Fine-Grained Object Classification | |
Complexity of Symbolic Representation in Working Memory of Transformer Correlates with the Complexity of a Task | |
Data Contamination Can Cross Language Barriers | |
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges | |
Interactive Evolution: A Neural-Symbolic Self-Training Framework For Large Language Models | |
Multimodal Structured Generation: CVPR's 2nd MMFM Challenge Technical Report | |
Probing the Decision Boundaries of In-context Learning in Large Language Models | |
CancerLLM: A Large Language Model in Cancer Domain | |
CarLLaVA: Vision language models for camera-only closed-loop driving | |
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs | |
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation | |
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters | |
VideoHallucer: Evaluating Intrinsic and Extrinsic Hallucinations in Large Video-Language Models | |
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions | |
OATH-Frames: Characterizing Online Attitudes Towards Homelessness with LLM Assistants | |
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models | |
Long Context Transfer from Language to Vision | |
Efficient Continual Pre-training by Mitigating the Stability Gap | |
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs | |
Sparse High Rank Adapters | |
Flow of Reasoning: Efficient Training of LLM Policy with Divergent Thinking | |
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages | |
WARP: On the Benefits of Weight Averaged Rewarded Policies | |
Scaling Laws for Linear Complexity Language Models | |
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training | |
Preference Tuning For Toxicity Mitigation Generalizes Across Languages | |
FIRST: Faster Improved Listwise Reranking with Single Token Decoding | |
InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context | |
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers | |
AutoDetect: Towards a Unified Framework for Automated Weakness Detection in Large Language Models | |
Confidence Regulation Neurons in Language Models | |
Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization | |
Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs | |
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models | |
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models | |
How Many Parameters Does it Take to Change a Light Bulb? Evaluating Performance in Self-Play of Conversational Games as a Function of Model Characteristics | |
Can Few-shot Work in Long-Context? Recycling the Context to Generate Demonstrations | |
Hallucination is Inevitable: An Innate Limitation of Large Language Models | |
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees | |
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models | |
Steering Without Side Effects: Improving Post-Deployment Control of Language Models | |
Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention | |
Brain-Like Language Processing via a Shallow Untrained Multihead Attention Network | |
PARIKSHA : A Large-Scale Investigation of Human-LLM Evaluator Agreement on Multilingual and Multi-Cultural Data | |
MultiAgent Collaboration Attack: Investigating Adversarial Attacks in Large Language Model Collaborations via Debate | |
PostMark: A Robust Blackbox Watermark for Large Language Models | |
Can LLMs Learn Macroeconomic Narratives from Social Media? | |
Embodied Instruction Following in Unknown Environments | |
MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning | |
Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon | |
Data curation via joint example selection further accelerates multimodal learning | |
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment | |
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients | |
LongIns: A Challenging Long-context Instruction-based Exam for LLMs | |
Multi-property Steering of Large Language Models with Dynamic Activation Composition | |
The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale | |
Benchmarking Mental State Representations in Language Models | |
Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA | |
Delving into the Utilisation of ChatGPT in Scientific Publications in Astronomy | |
How to Compute the Probability of a Word | |
Unlocking Continual Learning Abilities in Language Models | |
Large Language Models Assume People are More Rational than We Really are | |
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track | |
Finding Transformer Circuits with Edge Pruning | |
A mathematical perspective on Transformers | |
Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation | |
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt | |
LLMs' Classification Performance is Overclaimed | |
Cross-Modality Safety Alignment | |
Bridging Law and Data: Augmenting Reasoning via a Semi-Structured Dataset with IRAC methodology | |
Preference Distillation for Personalized Generative Recommendation | |
DialSim: A Real-Time Simulator for Evaluating Long-Term Dialogue Understanding of Conversational Agents | |
Evaluating $n$-Gram Novelty of Language Models Using Rusty-DAWG | |
Connecting the Dots: Evaluating Abstract Reasoning Capabilities of LLMs Using the New York Times Connections Word Game | |
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving | |
Associative Recurrent Memory Transformer | |
Symbolic Learning Enables Self-Evolving Agents | |
CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs | |
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models | |
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs | |
From Rewriting to Remembering: Common Ground for Conversational QA Models | |
Adversarial Search Engine Optimization for Large Language Models | |
A Closer Look into Mixture-of-Experts in Large Language Models | |
Multimodal foundation world models for generalist embodied agents | |
Do they mean 'us'? Interpreting Referring Expressions in Intergroup Bias | |
MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool | |
Efficacy of Language Model Self-Play in Non-Zero-Sum Games | |
Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models | |
Large Language Models are Interpretable Learners | |
Are Language Models Actually Useful for Time Series Forecasting? | |
CAVE: Controllable Authorship Verification Explanations | |
Building on Efficient Foundations: Effectively Training LLMs with Structured Feedforward Layers | |
EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records | |
One Thousand and One Pairs: A "novel" challenge for long-context language models | |
Breaking the Frame: Image Retrieval by Visual Overlap Prediction | |
CaT-BENCH: Benchmarking Language Model Understanding of Causal and Temporal Dependencies in Plans | |
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning | |
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models | |
EvoAgent: Towards Automatic Multi-Agent Generation via Evolutionary Algorithms | |
Synchronous Faithfulness Monitoring for Trustworthy Retrieval-Augmented Generation | |
A Benchmark for Learning to Translate a New Language from One Grammar Book | |
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding | |
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding | |
Aligning Teacher with Student Preferences for Tailored Training Data Generation | |
Simulating Classroom Education with LLM-Empowered Agents | |
SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation | |
Re-Ranking Step by Step: Investigating Pre-Filtering for Re-Ranking with Large Language Models | |
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs | |
LLM Targeted Underperformance Disproportionately Impacts Vulnerable Users | |
MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression | |
Can LLMs Learn by Teaching? A Preliminary Study | |
The Devil is in the Neurons: Interpreting and Mitigating Social Biases in Pre-trained Language Models | |
Is Programming by Example solved by LLMs? | |
Suri: Multi-constraint Instruction Following for Long-form Text Generation | |
Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs? | |
LiveBench: A Challenging, Contamination-Free LLM Benchmark | |
From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data | |
VERISCORE: Evaluating the factuality of verifiable claims in long-form text generation | |
Revealing Fine-Grained Values and Opinions in Large Language Models | |
T-FREE: Tokenizer-Free Generative LLMs via Sparse Representations for Memory-Efficient Embeddings | |
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models | |
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data | |
Understand What LLM Needs: Dual Preference Alignment for Retrieval-Augmented Generation | |
ResumeAtlas: Revisiting Resume Classification with Large-Scale Datasets and Large Language Models | |
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs | |
News Deja Vu: Connecting Past and Present with Semantic Search | |
Contrastive Entity Coreference and Disambiguation for Historical Texts | |
SAIL: Self-Improving Efficient Online Alignment of Large Language Models | |
AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models | |
Sonnet or Not, Bot? Poetry Evaluation for Large Models and Datasets | |
BEEAR: Embedding-based Adversarial Removal of Safety Backdoors in Instruction-tuned Language Models | |
Reasoning or Simply Next Token Prediction? A Benchmark for Stress-Testing Large Language Models | |
Macroeconomic Forecasting with Large Language Models | |
Self-Retrieval: Building an Information Retrieval System with One Large Language Model | |
Cognitive Architectures for Language Agents | |
Adaptable Logical Control for Large Language Models | |
The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More | |
DistiLRR: Transferring Code Repair for Low-Resource Programming Languages | |
A Critical Study of What Code-LLMs (Do Not) Learn | |
"Is ChatGPT a Better Explainer than My Professor?": Evaluating the Explanation Capabilities of LLMs in Conversation Compared to a Human Baseline | |
Efficient Evolutionary Search Over Chemical Space with Large Language Models | |
Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs | |
Understanding and Mitigating Language Confusion in LLMs | |
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy | |
Scaling Synthetic Data Creation with 1,000,000,000 Personas | |
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning | |
The Remarkable Robustness of LLMs: Stages of Inference? | |
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale | |
Following Length Constraints in Instructions | |
AutoRAG-HP: Automatic Online Hyper-Parameter Tuning for Retrieval-Augmented Generation | |
Token Erasure as a Footprint of Implicit Vocabulary Items in LLMs | |
Molecular Facts: Desiderata for Decontextualization in LLM Fact Verification | |
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model | |
Direct Preference Knowledge Distillation for Large Language Models | |
Investigating How Large Language Models Leverage Internal Knowledge to Perform Complex Reasoning | |
Monitoring Latent World States in Language Models with Propositional Probes | |
RouteLLM: Learning to Route LLMs with Preference Data | |
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks | |
DARG: Dynamic Evaluation of Large Language Models via Adaptive Reasoning Graph | |
RaTEScore: A Metric for Radiology Report Generation | |
PhyloLM : Inferring the Phylogeny of Large Language Models and Predicting their Performances in Benchmarks | |
Flora: Low-Rank Adapters Are Secretly Gradient Compressors | |
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language | |
Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation | |
Scaling Laws for Fact Memorization of Large Language Models | |
Less is More: Accurate Speech Recognition & Translation without Web-Scale Data | |
RegMix: Data Mixture as Regression for Language Model Pre-training | |
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives | |
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging | |
ColPali: Efficient Document Retrieval with Vision Language Models | |
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion | |
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems | |
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? | |
Show Less, Instruct More: Enriching Prompts with Definitions and Guidelines for Zero-Shot NER | |
MIRAI: Evaluating LLM Agents for Event Forecasting | |
Searching for Best Practices in Retrieval-Augmented Generation | |
$\text{Memory}^3$: Language Modeling with Explicit Memory | |
Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation | |
BERGEN: A Benchmarking Library for Retrieval-Augmented Generation | |
M2QA: Multi-domain Multilingual Question Answering | |
Step-Controlled DPO: Leveraging Stepwise Error for Enhanced Mathematical Reasoning | |
Chain-of-Knowledge: Integrating Knowledge Reasoning into Large Language Models by Learning from Knowledge Graphs | |
Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning | |
MMEvalPro: Calibrating Multimodal Benchmarks Towards Trustworthy and Efficient Evaluation | |
Brevity is the soul of wit: Pruning long files for code generation | |
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention | |
From RAG to RICHES: Retrieval Interlaced with Sequence Generation | |
LiteSearch: Efficacious Tree Search for LLM | |
Detection and Measurement of Syntactic Templates in Generated Text | |
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving | |
OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents | |
Accurate Prediction of Ligand-Protein Interaction Affinities with Fine-Tuned Small Language Models | |
UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI | |
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge | |
Compressing Search with Language Models | |
Combinatorial Reasoning: Selecting Reasons in Generative AI Pipelines via Combinatorial Optimization | |
ProgressGym: Alignment with a Millennium of Moral Progress | |
The SIFo Benchmark: Investigating the Sequential Instruction Following Ability of Large Language Models | |
Changing Answer Order Can Decrease MMLU Accuracy | |
MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | |
Understanding Alignment in Multimodal LLMs: A Comprehensive Study | |
ValueScope: Unveiling Implicit Norms and Values via Return Potential Model of Social Interactions | |
Why does in-context learning fail sometimes? Evaluating in-context learning on open and closed questions | |
To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models | |
μ-Bench: A Vision-Language Benchmark for Microscopy Understanding | |
A Review of Large Language Models and Autonomous Agents in Chemistry | |
Agentless: Demystifying LLM-based Software Engineering Agents | |
Eliminating Position Bias of Language Models: A Mechanistic Approach | |
Resolving Discrepancies in Compute-Optimal Scaling of Language Models | |
OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset | |
FLoRA: Low-Rank Core Space for N-dimension | |
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output | |
TokenPacker: Efficient Visual Projector for Multimodal LLM | |
Investigating Decoder-only Large Language Models for Speech-to-text Translation | |
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models | |
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models | |
Evaluating Human Alignment and Model Faithfulness of LLM Rationale | |
Finding Blind Spots in Evaluator LLMs with Interpretable Checklists | |
On the Limitations of Fine-tuned Judge Models for LLM Evaluation | |
Is LLM-as-a-Judge Robust? Investigating Universal Adversarial Attacks on Zero-shot LLM Assessment | |
LeanDojo: Theorem Proving with Retrieval-Augmented Language Models | |
On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes | |
Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning | |
Tweetorial Hooks: Generative AI Tools to Motivate Science on Social Media | |
A Solvable Model of Neural Scaling Laws | |
Hopfield Networks is All You Need | |
Improving Transformer Models by Reordering their Sublayers | |
A False Sense of Safety: Unsafe Information Leakage in 'Safe' AI Responses | |
Prompt Stability Scoring for Text Annotation with Large Language Models | |
Survey on Knowledge Distillation for Large Language Models: Methods, Evaluation, and Application | |
AI-native Memory: A Pathway from LLMs Towards AGI | |
Do Models Explain Themselves? Counterfactual Simulatability of Natural Language Explanations | |
From Efficient Multimodal Models to World Models: A Survey | |
Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments | |
LLMs can learn self-restraint through iterative self-reflection | |
ReGround: Improving Textual and Spatial Grounding at No Cost | |
EvalLM: Interactive Evaluation of Large Language Model Prompts on User-Defined Criteria | |
Large language models can accurately predict searcher preferences | |
Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering | |
Large Language Models Enable Few-Shot Clustering | |
LM vs LM: Detecting Factual Errors via Cross Examination | |
Perspectives on Large Language Models for Relevance Judgment | |
Human-like Summarization Evaluation with ChatGPT | |
ChatGPT as a Factual Inconsistency Evaluator for Text Summarization | |
Self-Evaluation as a Defense Against Adversarial Attacks on LLMs | |
How Does Quantization Affect Multilingual LLMs? | |
Are Large Language Models Consistent over Value-laden Questions? | |
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs | |
Tree Search for Language Model Agents | |
Towards Compositionality in Concept Learning | |
Unified Auto-Encoding with Masked Diffusion | |
GraphEdit: Large Language Models for Graph Structure Learning | |
Meta Large Language Model Compiler: Foundation Models of Compiler Optimization | |
LLM-Select: Feature Selection with Large Language Models | |
Improving Reward Models with Synthetic Critiques | |
JailbreakZoo: Survey, Landscapes, and Horizons in Jailbreaking Large Language and Vision-Language Models | |
An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models | |
Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs | |
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition | |
On scalable oversight with weak LLMs judging strong LLMs | |
Fast Forwarding Low-Rank Training | |
Learning to (Learn at Test Time): RNNs with Expressive Hidden States | |
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models | |
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents | |
Mixture of A Million Experts | |
DotaMath: Decomposition of Thought with Code Assistance and Self-correction for Mathematical Reasoning | |
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs | |
LLM-jp: A Cross-organizational Project for the Research and Development of Fully Open Japanese LLMs | |
Anthropocentric bias and the possibility of artificial cognition | |
AgentInstruct: Toward Generative Teaching with Agentic Flows | |
HEMM: Holistic Evaluation of Multimodal Foundation Models | |
Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks | |
52B to 1T: Lessons Learned via Tele-FLM Series | |
Reasoning in Large Language Models: A Geometric Perspective | |
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs | |
Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling | |
Synthetic Multimodal Question Generation | |
Unveiling Encoder-Free Vision-Language Models | |
$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens | |
Distilling System 2 into System 1 | |
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages | |
RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models | |
Granular Privacy Control for Geolocation with Vision Language Models | |
VRSD: Rethinking Similarity and Diversity for Retrieval in Large Language Models | |
Zero-shot Persuasive Chatbots with LLM-Generated Strategies and Information Retrieval | |
Large Language Model as an Assignment Evaluator: Insights, Feedback, and Challenges in a 1000+ Student Course | |
FBI-LLM: Scaling Up Fully Binarized LLMs from Scratch via Autoregressive Distillation | |
Multi-Object Hallucination in Vision-Language Models | |
ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation | |
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models | |
From Loops to Oops: Fallback Behaviors of Language Models Under Uncertainty | |
PAS: Data-Efficient Plug-and-Play Prompt Augmentation System | |
An Empirical Comparison of Vocabulary Expansion and Initialization Approaches for Language Models | |
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct | |
LLMBox: A Comprehensive Library for Large Language Models | |
Training Task Experts through Retrieval Based Distillation | |
Language Models Encode Collaborative Signals in Recommendation | |
ANAH-v2: Scaling Analytical Hallucination Annotation of Large Language Models | |
When LLMs Play the Telephone Game: Cumulative Changes and Attractors in Iterated Cultural Transmissions | |
LLMAEL: Large Language Models are Good Context Augmenters for Entity Linking | |
Evaluating Language Model Context Windows: A "Working Memory" Test and Inference-time Correction | |
MeMemo: On-device Retrieval Augmentation for Private and Personalized Text Generation | |
Machine Unlearning Fails to Remove Data Poisoning Attacks | |
BeHonest: Benchmarking Honesty in Large Language Models | |
Emu: Generative Pretraining in Multimodality | |
Enabling Large Language Models to Generate Text with Citations | |
Adapting LLMs to Hebrew: Unveiling DictaLM 2.0 with Enhanced Vocabulary and Instruction Capabilities | |
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps | |
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence | |
Vision language models are blind | |
Composable Interventions for Language Models | |
A Single Transformer for Scalable Vision-Language Modeling | |
MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension | |
Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs | |
Decoding-Time Language Model Alignment with Multiple Objectives | |
WebCanvas: Benchmarking Web Agents in Online Environments | |
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths | |
Visual representations in the human brain are aligned with large language models | |
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models | |
RAG vs. Long Context: Examining Frontier Large Language Models for Environmental Review Document Comprehension | |
Inference Performance Optimization for Large Language Models on CPUs | |
LETS-C: Leveraging Language Embedding for Time Series Classification | |
Just read twice: closing the recall gap for recurrent language models | |
How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions | |
TheoremLlama: Transforming General-Purpose LLMs into Lean4 Experts | |
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging | |
Knowledge Composition using Task Vectors with Learned Anisotropic Scaling | |
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations | |
Forcing Diffuse Distributions out of Language Models | |
Evaluating LLMs at Detecting Errors in LLM Responses | |
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models | |
R-Tuning: Instructing Large Language Models to Say `I Don't Know' | |
Label Supervised LLaMA Finetuning | |
Norm Tweaking: High-performance Low-bit Quantization of Large Language Models | |
PentestGPT: An LLM-empowered Automatic Penetration Testing Tool | |
Non Verbis, Sed Rebus: Large Language Models are Weak Solvers of Italian Rebuses | |
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models | |
Review-LLM: Harnessing Large Language Models for Personalized Review Generation | |
Do Vision and Language Models Share Concepts? A Vector Space Alignment Study | |
MAVIS: Mathematical Visual Instruction Tuning | |
Automata-based constraints for language model decoding | |
GTA: A Benchmark for General Tool Agents | |
SEED-Story: Multimodal Long Story Generation with Large Language Model | |
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective | |
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On | |
PersonaRAG: Enhancing Retrieval-Augmented Generation Systems with User-Centric Agents | |
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception | |
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients | |
DiscoveryBench: Towards Data-Driven Discovery with Large Language Models | |
MiniCache: KV Cache Compression in Depth Dimension for Large Language Models | |
Genomic Language Models: Opportunities and Challenges | |
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model | |
Self-Recognition in Language Models | |
Deconstructing What Makes a Good Optimizer for Language Models | |
Teaching Transformers Causal Reasoning through Axiomatic Training | |
Grounding and Evaluation for Large Language Models: Practical Challenges and Lessons Learned (Survey) | |
CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation | |
ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context | |
Why are Visually-Grounded Language Models Bad at Image Classification? | |
LoQT: Low Rank Adapters for Quantized Training | |
Metron: Holistic Performance Evaluation Framework for LLM Inference Systems | |
Lynx: An Open Source Hallucination Evaluation Model | |
Mitigating Catastrophic Forgetting in Language Transfer via Model Merging | |
LoRA-Guard: Parameter-Efficient Guardrail Adaptation for Content Moderation of Large Language Models | |
Human-like Episodic Memory for Infinite Context LLMs | |
MUSCLE: A Model Update Strategy for Compatible LLM Evolution | |
H2O-Danube3 Technical Report | |
Context Embeddings for Efficient Answer Generation in RAG | |
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models | |
RoboMorph: Evolving Robot Morphology using Large Language Models | |
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers | |
Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training | |
New Desiderata for Direct Preference Optimization | |
Characterizing Prompt Compression Methods for Long Context Inference | |
Large Models of What? Mistaking Engineering Achievements for Human Linguistic Agency | |
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing | |
MUSE: Machine Unlearning Six-Way Evaluation for Language Models | |
Accuracy is Not All You Need | |
AutoBencher: Creating Salient, Novel, Difficult Datasets for Language Models | |
Outliers and Calibration Sets have Diminishing Effect on Quantization of Modern LLMs | |
Universal Neurons in GPT2 Language Models | |
Agent Instructs Large Language Models to be General Zero-Shot Reasoners | |
Qwen2 Technical Report | |
The Good, The Bad, and The Greedy: Evaluation of LLMs Should Not Ignore Non-Determinism | |
LAB-Bench: Measuring Capabilities of Language Models for Biology Research | |
Q-Sparse: All Large Language Models can be Fully Sparsely-Activated | |
MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models | |
Representing Rule-based Chatbots with Transformers | |
Learning to Refuse: Towards Mitigating Privacy Risks in LLMs | |
Benchmarking Language Model Creativity: A Case Study on Code Generation | |
Mixture-of-Modules: Reinventing Transformers as Dynamic Assemblies of Modules | |
Spontaneous Reward Hacking in Iterative Self-Refinement | |
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients | |
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows? | |
LLM Circuit Analyses Are Consistent Across Training and Scale | |
Foundational Autoraters: Taming Large Language Models for Better Automatic Evaluation | |
NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window? | |
Data-Juicer Sandbox: A Comprehensive Suite for Multimodal Data-Model Co-development | |
Fast Matrix Multiplications for Lookup Table-Quantized LLMs | |
Fine-Tuning and Prompt Optimization: Two Great Steps that Work Better Together | |
Bridging the Gap Between Information Seeking and Product Search Systems: Q&A Recommendation for E-commerce | |
When is the consistent prediction likely to be a correct prediction? | |
Transformer tricks: Removing weights for skipless transformers | |
Transformers represent belief state geometry in their residual stream | |
Debug like a Human: A Large Language Model Debugger via Verifying Runtime Execution Step-by-step | |
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition | |
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models | |
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment | |
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models | |
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces | |
Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models | |
A Survey on LoRA of Large Language Models | |
No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations | |
LMMs-Eval: Reality Check on the Evaluation of Large Multimodal Models | |
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos | |
Patch-Level Training for Large Language Models | |
E5-V: Universal Embeddings with Multimodal Large Language Models | |
Case2Code: Learning Inductive Reasoning with Synthetic Data | |
AgentPoison: Red-teaming LLM Agents via Poisoning Memory or Knowledge Bases | |
Spectra: A Comprehensive Study of Ternary, Quantized, and FP16 Language Models | |
Splatfacto-W: A Nerfstudio Implementation of Gaussian Splatting for Unconstrained Photo Collections | |
The Art of Saying No: Contextual Noncompliance in Language Models | |
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models | |
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention | |
NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models | |
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression | |
Practical Unlearning for Large Language Models | |
Does Refusal Training in LLMs Generalize to the Past Tense? | |
Automatic Prompt Optimization with "Gradient Descent" and Beam Search | |
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies | |
CodeV: Empowering LLMs for Verilog Generation through Multi-Level Summarization | |
Understanding Reference Policies in Direct Preference Optimization | |
Benchmark Agreement Testing Done Right: A Guide for LLM Benchmark Evaluation | |
PM-LLM-Benchmark: Evaluating Large Language Models on Process Mining Tasks | |
Scaling Retrieval-Based Language Models with a Trillion-Token Datastore | |
Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study | |
Attention Overflow: Language Model Input Blur during Long-Context Missing Items Recommendation | |
Weak-to-Strong Reasoning | |
Direct-Inverse Prompting: Analyzing LLMs' Discriminative Capacity in Self-Improving Generation | |
Benchmarking Vision Language Models for Cultural Understanding | |
DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations | |
Discovering Bias in Latent Space: An Unsupervised Debiasing Approach | |
A Survey of Prompt Engineering Methods in Large Language Models for Different NLP Tasks | |
DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems | |
Scaling Granite Code Models to 128K Context | |
BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval | |
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models | |
Understanding Counting in Small Transformers: The Interplay between Attention and Feed-Forward Layers | |
Sibyl: Simple yet Effective Agent Framework for Complex Real-world Reasoning | |
Lean-STaR: Learning to Interleave Thinking and Proving | |
GAVEL: Generating Games Via Evolution and Language Models | |
Transformer Layers as Painters | |
AUITestAgent: Automatic Requirements Oriented GUI Function Testing | |
Is Your Model Really A Good Math Reasoner? Evaluating Mathematical Reasoning with Checklist | |
Training on the Test Task Confounds Evaluation and Emergence | |
The Human Factor in AI Red Teaming: Perspectives from Social and Collaborative Computing | |
PaliGemma: A versatile 3B VLM for transfer | |
A Survey on Mixture of Experts | |
Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning | |
Consent in Crisis: The Rapid Decline of the AI Data Commons | |
ChatQA 2: Bridging the Gap to Proprietary LLMs in Long Context and RAG Capabilities | |
Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders | |
The Vision of Autonomic Computing: Can LLMs Make It a Reality? | |
EVLM: An Efficient Vision-Language Model for Visual Understanding | |
LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference | |
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle | |
Internal Consistency and Self-Feedback in Large Language Models: A Survey | |
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition | |
SciCode: A Research Coding Benchmark Curated by Scientists | |
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding | |
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning | |
VideoGameBunny: Towards vision assistants for video games | |
GET-Zero: Graph Embodiment Transformer for Zero-shot Embodiment Generalization | |
NNsight and NDIF: Democratizing Access to Foundation Model Internals | |
Fractal Patterns May Illuminate the Success of Next-Token Prediction | |
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct | |
NV-Retriever: Improving text embedding models with effective hard-negative mining | |
Efficient Retrieval with Learned Similarities | |
Knowledge Mechanisms in Large Language Models: A Survey and Perspective | |
Gated Linear Attention Transformers with Hardware-Efficient Training | |
SmoothQuant+: Accurate and Efficient 4-bit Post-Training WeightQuantization for LLM | |
Discrete Flow Matching | |
MIBench: Evaluating Multimodal Large Language Models over Multiple Images | |
BOND: Aligning LLMs with Best-of-N Distillation | |
Foundational Models Defining a New Era in Vision: A Survey and Outlook | |
Shared Imagination: LLMs Hallucinate Alike | |
Aligning Large Language Models with Human: A Survey | |
Compact Language Models via Pruning and Knowledge Distillation | |
Correcting the Mythos of KL-Regularization: Direct Alignment without Overoptimization via Chi-Squared Preference Optimization | |
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis | |
Knesset-DictaBERT: A Hebrew Language Model for Parliamentary Proceedings | |
IrokoBench: A New Benchmark for African Languages in the Age of Large Language Models | |
To FP8 and Back Again: Quantifying the Effects of Reducing Precision on LLM Training Stability | |
Demystifying Chains, Trees, and Graphs of Thoughts | |
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model | |
The Larger the Better? Improved LLM Code-Generation via Budget Reallocation | |
PrimeGuard: Safe and Helpful LLMs through Tuning-Free Routing | |
Testing Occupational Gender Bias in Language Models: Towards Robust Measurement and Zero-Shot Debiasing | |
PERSONA: A Reproducible Testbed for Pluralistic Alignment | |
Scalify: scale propagation for efficient low-precision LLM training | |
Reinforced Prompt Personalization for Recommendation with Large Language Models | |
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents | |
DDK: Distilling Domain Knowledge for Efficient Large Language Models | |
Course-Correction: Safety Alignment Using Synthetic Preferences | |
Longhorn: State Space Models are Amortized Online Learners | |
u-$μ$P: The Unit-Scaled Maximal Update Parametrization | |
Recursive Introspection: Teaching Language Model Agents How to Self-Improve | |
Keep the Cost Down: A Review on Methods to Optimize LLM' s KV-Cache Consumption | |
Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data? | |
Fluent Student-Teacher Redteaming | |
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? | |
Efficient Inference of Vision Instruction-Following Models with Elastic Cache | |
Very Large-Scale Multi-Agent Simulation in AgentScope | |
AMEX: Android Multi-annotation Expo Dataset for Mobile GUI Agents | |
$VILA^2$: VILA Augmented VILA | |
AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks? | |
Do Generative AI Models Output Harm while Representing Non-Western Cultures: Evidence from A Community-Centered Approach | |
Visual Haystacks: Answering Harder Questions About Sets of Images | |
Dallah: A Dialect-Aware Multimodal Large Language Model for Arabic | |
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach | |
Generalization v.s. Memorization: Tracing Language Models' Capabilities Back to Pretraining Data | |
ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts | |
Prover-Verifier Games improve legibility of LLM outputs | |
Exploring Advanced Large Language Models with LLMsuite | |
Towards Adversarially Robust Vision-Language Models: Insights from Design Choices and Prompt Formatting Techniques | |
The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities | |
Vectoring Languages | |
Fine-grained Analysis of In-context Linear Estimation: Data, Architecture, and Beyond | |
LoRA-Pro: Are Low-Rank Adapters Properly Optimized? | |
RadioRAG: Factual Large Language Models for Enhanced Diagnostics in Radiology Using Dynamic Retrieval Augmented Generation | |
RAG-QA Arena: Evaluating Domain Robustness for Long-form Retrieval Augmented Question Answering | |
The Art of Refusal: A Survey of Abstention in Large Language Models | |
SALMON: Self-Alignment with Instructable Reward Models | |
Small Molecule Optimization with Large Language Models | |
Generation Constraint Scaling Can Mitigate Hallucination | |
A Survey on Employing Large Language Models for Text-to-SQL Tasks | |
Trust or Escalate: LLM Judges with Provable Guarantees for Human Agreement | |
Prompt Injection Attacks on Large Language Models in Oncology | |
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher | |
Theia: Distilling Diverse Vision Foundation Models for Robot Learning | |
Diffusion Feedback Helps CLIP See Better | |
Sentiment Analysis of Lithuanian Online Reviews Using Large Language Models | |
VolDoGer: LLM-assisted Datasets for Domain Generalization in Vision-Language Tasks | |
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages | |
Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge | |
SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain | |
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models | |
Integrating Large Language Models into a Tri-Modal Architecture for Automated Depression Classification | |
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains | |
PersonaGym: Evaluating Persona Agents and LLMs | |
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training | |
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? | |
Transformers need glasses! Information over-squashing in language tasks | |
ThinK: Thinner Key Cache by Query-Driven Pruning | |
Meltemi: The first open Large Language Model for Greek | |
Adapting Safe-for-Work Classifier for Malaysian Language Text: Enhancing Alignment in LLM-Ops Framework | |
Machine Unlearning in Generative AI: A Survey | |
A Large Encoder-Decoder Family of Foundation Models For Chemical Language | |
Bringing AI Participation Down to Scale: A Comment on Open AIs Democratic Inputs to AI Project | |
AI-Assisted Generation of Difficult Math Questions | |
Matryoshka-Adaptor: Unsupervised and Supervised Tuning for Smaller Embedding Dimensions | |
Unlocking Tokens as Data Points for Generalization Bounds on Larger Language Models | |
Demystifying Verbatim Memorization in Large Language Models | |
Can LLMs be Fooled? Investigating Vulnerabilities in LLMs | |
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling | |
The Llama 3 Herd of Models | |
ShieldGemma: Generative AI Content Moderation Based on Gemma | |
MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts | |
Adaptive Retrieval-Augmented Generation for Conversational Systems | |
Towards Achieving Human Parity on End-to-end Simultaneous Speech Translation via LLM Agent | |
PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems | |
Latxa: An Open Language Model and Evaluation Suite for Basque | |
Improving Retrieval Augmented Language Model with Self-Reasoning | |
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey | |
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? | |
Data Contamination Report from the 2024 CONDA Shared Task | |
Self-Training with Direct Preference Optimization Improves Chain-of-Thought Reasoning | |
Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack | |
Are LLMs classical or nonmonotonic reasoners? Lessons from generics | |
MM-Vet v2: A Challenging Benchmark to Evaluate Large Multimodal Models for Integrated Capabilities | |
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation | |
Tamper-Resistant Safeguards for Open-Weight LLMs | |
Coarse Correspondence Elicit 3D Spacetime Understanding in Multimodal Language Model | |
Improving Text Embeddings for Smaller Language Models Using Contrastive Fine-tuning | |
OmniParser for Pure Vision Based GUI Agent | |
Finch: Prompt-guided Key-Value Cache Compression | |
Gemma 2: Improving Open Language Models at a Practical Size | |
Inductive or Deductive? Rethinking the Fundamental Reasoning Abilities of LLMs | |
Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models | |
An Empirical Analysis of Compute-Optimal Inference for Problem-Solving with Language Models | |
Enhancing Semantic Similarity Understanding in Arabic NLP with Nested Embedding Learning | |
Improving Retrieval-Augmented Generation in Medicine with Iterative Follow-up Questions | |
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs | |
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations | |
$\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs | |
Apple Intelligence Foundation Language Models | |
Multi-group Uncertainty Quantification for Long-form Text Generation | |
MaskInversion: Localized Embeddings via Optimization of Explainability Maps | |
Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning | |
Transformers are Universal In-context Learners | |
RAGEval: Scenario Specific RAG Evaluation Dataset Generation Framework | |
In-Context Example Selection via Similarity Search Improves Low-Resource Machine Translation | |
Leveraging LLM Reasoning Enhances Personalized Recommender Systems | |
Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost | |
MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models | |
A Survey of Mamba | |
Jailbreaking Text-to-Image Models with LLM-Based Agents | |
Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins | |
MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training | |
Modular RAG: Transforming RAG Systems into LEGO-like Reconfigurable Frameworks | |
Generative Retrieval with Preference Optimization for E-commerce Search | |
The Geometry of Queries: Query-Based Innovations in Retrieval-Augmented Generation | |
Improving Retrieval in Sponsored Search by Leveraging Query Context Signals | |
GRAD-SUM: Leveraging Gradient Summarization for Optimal Prompt Engineering | |
Crafting the Path: Robust Query Rewriting for Information Retrieval | |
Harnessing Large Language Models for Multimodal Product Bundling | |
RAGBench: Explainable Benchmark for Retrieval-Augmented Generation Systems | |
All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era | |
Beyond Benchmarks: Evaluating Embedding Model Similarity for Retrieval Augmented Generation Systems | |
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting | |
Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models | |
Vortex under Ripplet: An Empirical Study of RAG-enabled Applications | |
MemoCRS: Memory-enhanced Sequential Conversational Recommender Systems with Large Language Models | |
Neurocache: Efficient Vector Retrieval for Long-range Language Modeling | |
Reliable Confidence Intervals for Information Retrieval Evaluation Using Generative A.I | |
AdaCQR: Enhancing Query Reformulation for Conversational Search via Sparse and Dense Retrieval Alignment | |
Ground Every Sentence: Improving Retrieval-Augmented LLMs with Interleaved Reference-Claim Generation | |
Retrieval-augmented generation in multilingual settings | |
Optimization of Retrieval-Augmented Generation Context with Outlier Detection | |
"Glue pizza and eat rocks" -- Exploiting Vulnerabilities in Retrieval-Augmented Generative Models | |
Retrieval-style In-Context Learning for Few-shot Hierarchical Text Classification | |
LumberChunker: Long-Form Narrative Document Segmentation | |
Entropy-Based Decoding for Retrieval-Augmented Large Language Models | |
Improving Zero-shot LLM Re-Ranker with Risk Minimization | |
A Text is Worth Several Tokens: Text Embedding from LLMs Secretly Aligns Well with The Key Tokens | |
D2LLM: Decomposed and Distilled Large Language Models for Semantic Search | |
Retrieval Augmented Zero-Shot Text Classification | |
APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking | |
StackRAG Agent: Improving Developer Answers with Retrieval-Augmented Generation | |
PromptDSI: Prompt-based Rehearsal-free Instance-wise Incremental Learning for Document Retrieval | |
RichRAG: Crafting Rich Responses for Multi-faceted Queries in Retrieval-Augmented Generation | |
Unified Active Retrieval for Retrieval Augmented Generation | |
LLM-enhanced Reranking in Recommender Systems | |
Intermediate Distillation: Data-Efficient Distillation from Black-Box LLMs for Information Retrieval | |
CrAM: Credibility-Aware Attention Modification in LLMs for Combating Misinformation in RAG | |
The Impact of Quantization on Retrieval-Augmented Generation: An Analysis of Small LLMs | |
Improving LLMs for Recommendation with Out-Of-Vocabulary Tokens | |
A Software Engineering Perspective on Testing Large Language Models: Research, Practice, Tools and Benchmarks | |
Supportiveness-based Knowledge Rewriting for Retrieval-augmented Language Modeling | |
Blowfish: Topological and statistical signatures for quantifying ambiguity in semantic search | |
Async Learned User Embeddings for Ads Delivery Optimization | |
Machine Against the RAG: Jamming Retrieval-Augmented Generation with Blocker Documents | |
RE-RAG: Improving Open-Domain QA Performance and Interpretability with Relevance Estimator in Retrieval-Augmented Generation | |
MrRank: Improving Question Answering Retrieval System through Multi-Result Ranking Model | |
Evaluating the External and Parametric Knowledge Fusion of Large Language Models | |
DomainRAG: A Chinese Benchmark for Evaluating Domain-specific Retrieval-Augmented Generation | |
Generative Explore-Exploit: Training-free Optimization of Generative Recommender Systems using LLM Optimizers | |
RAG Does Not Work for Enterprises | |
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models | |
Voice Jailbreak Attacks Against GPT-4o | |
CtrlA: Adaptive Retrieval-Augmented Generation via Probe-Guided Control | |
DeeperImpact: Optimizing Sparse Learned Index Structures | |
Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning | |
Cocktail: A Comprehensive Information Retrieval Benchmark with LLM-Generated Documents Integration | |
Unlocking Multi-View Insights in Knowledge-Dense Retrieval-Augmented Generation | |
RAEE: A Training-Free Retrieval-Augmented Early Exiting Framework for Efficient Inference | |
RaFe: Ranking Feedback Improves Query Rewriting for RAG | |
Question-Based Retrieval using Atomic Units for Enterprise RAG | |
SynthesizRR: Generating Diverse Datasets with Retrieval Augmentation | |
Words Blending Boxes. Obfuscating Queries in Information Retrieval using Differential Privacy | |
Redefining Information Retrieval of Structured Database via Large Language Models | |
Contextualization with SPLADE for High Recall Retrieval | |
Lifelong Knowledge Editing for LLMs with Retrieval-Augmented Continuous Prompt Learning | |
Comparative Analysis of Retrieval Systems in the Real World | |
Semi-Parametric Retrieval via Binary Token Index | |
Efficient and Responsible Adaptation of Large Language Models for Robust Top-k Recommendations | |
GRAMMAR: Grounded and Modular Methodology for Assessment of Closed-Domain Retrieval-Augmented Language Model | |
Retrieval-Oriented Knowledge for Click-Through Rate Prediction | |
Leveraging Large Language Models for Multimodal Search | |
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs | |
From Matching to Generation: A Survey on Generative Information Retrieval | |
Retrieval Augmented Generation for Domain-specific Question Answering | |
Planning Ahead in Generative Retrieval: Guiding Autoregressive Generation through Simultaneous Decoding | |
Tree of Reviews: A Tree-based Dynamic Iterative Retrieval Framework for Multi-hop Question Answering | |
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models | |
Dubo-SQL: Diverse Retrieval-Augmented Generation and Fine Tuning for Text-to-SQL | |
Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers | |
Consolidating Ranking and Relevance Predictions of Large Language Models through Post-Processing | |
Recall-Augmented Ranking: Enhancing Click-Through Rate Prediction Accuracy with Cross-Stage Data | |
The Elephant in the Room: Rethinking the Usage of Pre-trained Language Model in Sequential Recommendation | |
Efficient Prompting Methods for Large Language Models: A Survey | |
Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models | |
PMG : Personalized Multimodal Generation with Large Language Models | |
RecGPT: Generative Personalized Prompts for Sequential Recommendation via ChatGPT Training Paradigm | |
Taxonomy and Analysis of Sensitive User Queries in Generative AI Search | |
Generative Information Retrieval Evaluation | |
End-to-end training of Multimodal Model and ranking Model | |
Event-enhanced Retrieval in Real-time Search | |
Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation | |
Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models | |
CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems | |
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods | |
Improving Retrieval Augmented Open-Domain Question-Answering with Vectorized Contexts | |
Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models | |
Where to Move Next: Zero-shot Generalization of LLMs for Next POI Recommendation | |
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems | |
Shallow Cross-Encoders for Low-Latency Retrieval | |
Retrieval-Enhanced Knowledge Editing for Multi-Hop Question Answering in Language Models | |
Generate then Retrieve: Conversational Response Retrieval Using LLMs as Answer and Query Generators | |
Are Large Language Models Good at Utility Judgments? | |
SelfIE: Self-Interpretation of Large Language Model Embeddings | |
Make Large Language Model a Better Ranker | |
Boosting Conversational Question Answering with Fine-Grained Retrieval-Augmentation and Self-Check | |
CoLLEGe: Concept Embedding Generation for Large Language Models | |
Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation | |
JORA: JAX Tensor-Parallel LoRA Library for Retrieval Augmented Fine-Tuning | |
Improving the Robustness of Dense Retrievers Against Typos via Multi-Positive Contrastive Learning | |
Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases | |
Investigating the performance of Retrieval-Augmented Generation and fine-tuning for the development of AI-driven knowledge-based systems | |
RA-ISF: Learning to Answer and Understand from Retrieval Augmentation via Iterative Self-Feedback | |
ToolRerank: Adaptive and Hierarchy-Aware Reranking for Tool Retrieval | |
RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems | |
PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design | |
Chaining text-to-image and large language model: A novel approach for generating personalized e-commerce banners | |
LocalRQA: From Generating Data to Locally Training, Testing, and Deploying Retrieval-Augmented QA Systems | |
An Interpretable Ensemble of Graph and Language Models for Improving Search Relevance in E-Commerce | |
LLM-Ensemble: Optimal Large Language Model Ensemble Method for E-commerce Product Attribute Value Extraction | |
Embedding-based search in JetBrains IDEs | |
RAM-EHR: Retrieval Augmentation Meets Clinical Predictions on Electronic Health Records | |
Utilizing BERT for Information Retrieval: Survey, Applications, Resources, and Challenges | |
ChatDiet: Empowering Personalized Nutrition-Oriented Food Recommender Chatbots through an LLM-Augmented Framework | |
Meta-Task Prompting Elicits Embeddings from Large Language Models | |
The First Place Solution of WSDM Cup 2024: Leveraging Large Language Models for Conversational Multi-Doc QA | |
Unsupervised Information Refinement Training of Large Language Models for Retrieval-Augmented Generation | |
Corpus-Steered Query Expansion with Large Language Models | |
REAR: A Relevance-Aware Retrieval-Augmented Framework for Open-Domain Question Answering | |
The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG) | |
Large Language Model Augmented Exercise Retrieval for Personalized Language Learning | |
ESE: Espresso Sentence Embeddings | |
ARL2: Aligning Retrievers for Black-box Large Language Models via Self-guided Adaptive Relevance Labeling | |
Self-DC: When to retrieve and When to generate? Self Divide-and-Conquer for Compositional Unknown Questions | |
Retrieval Helps or Hurts? A Deeper Dive into the Efficacy of Retrieval Augmentation to Language Models | |
Are ELECTRA's Sentence Embeddings Beyond Repair? The Case of Semantic Textual Similarity | |
Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge | |
ARKS: Active Retrieval in Knowledge Soup for Code Generation | |
Explain then Rank: Scale Calibration of Neural Rankers Using Natural Language Explanations from Large Language Models | |
BIDER: Bridging Knowledge Inconsistency for Efficient Retrieval-Augmented LLMs via Key Supporting Evidence | |
Small Models, Big Insights: Leveraging Slim Proxy Models To Decide When and What to Retrieve for LLMs | |
TriSampler: A Better Negative Sampling Principle for Dense Retrieval | |
EcoRank: Budget-Constrained Text Re-ranking Using Large Language Models | |
Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models | |
Pandora: Jailbreak GPTs by Retrieval Augmented Generation Poisoning | |
PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers | |
Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers | |
G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering | |
T-RAG: Lessons from the LLM Trenches | |
Prompt Perturbation in Retrieval-Augmented Generation based Large Language Models | |
REALM: RAG-Driven Enhancement of Multimodal Electronic Health Records Analysis via Large Language Models | |
Chained Tuning Leads to Biased Forgetting | |
Non-autoregressive Generative Models for Reranking Recommendation | |
History, Development, and Principles of Large Language Models-An Introductory Survey | |
Multimodal Query Suggestion with Multi-Agent Reinforcement Learning from Human Feedback | |
Leveraging LLMs for Unsupervised Dense Retriever Ranking | |
RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation | |
Retrieve to Explain: Evidence-driven Predictions with Language Models | |
C-RAG: Certified Generation Risks for Retrieval-Augmented Language Models | |
Locally-Adaptive Quantization for Streaming Vector Search | |
HiQA: A Hierarchical Contextual Augmentation RAG for Massive Documents QA | |
When Large Language Models Meet Vector Databases: A Survey | |
Data-efficient Fine-tuning for LLM-based Recommendation | |
CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models | |
Re3val: Reinforced and Reranked Generative Retrieval | |
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models | |
Generative Dense Retrieval: Memory Can Be a Burden | |
The Chronicles of RAG: The Retriever, the Chunk and the Generator | |
Curator: Efficient Indexing for Multi-Tenant Vector Databases | |
Bridging the Preference Gap between Retrievers and LLMs | |
InRanker: Distilled Rankers for Zero-shot Information Retrieval | |
Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis | |
ChatGPT for Conversational Recommendation: Refining Recommendations by Reprompting with Feedback | |
Unsupervised hard Negative Augmentation for contrastive learning | |
Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models | |
RecRanker: Instruction Tuning Large Language Model as Ranker for Top-k Recommendation | |
Large Language Models are Not Stable Recommender Systems | |
ESPN: Memory-Efficient Multi-Vector Information Retrieval | |
Unlocking the Potential of Large Language Models for Explainable Recommendations | |
Preliminary Study on Incremental Learning for Large Language Model-based Recommender Systems | |
Agent4Ranking: Semantic Robust Ranking via Personalized Query Rewriting Using Multi-agent LLM | |
Dense X Retrieval: What Retrieval Granularity Should We Use? | |
Some things are more CRINGE than others: Iterative Preference Optimization with the Pairwise Cringe Loss | |
End-to-End Retrieval with Learned Dense and Sparse Representations Using Lucene | |
IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions | |
ControlRec: Bridging the Semantic Gap between Language Model and Personalized Recommendation | |
RecExplainer: Aligning Large Language Models for Explaining Recommendation Models | |
Golden-Retriever: High-Fidelity Agentic Retrieval Augmented Generation for Industrial Knowledge Base | |
Back to Basics: A Simple Recipe for Improving Out-of-Domain Retrieval in Dense Encoders | |
On Retrieval Augmentation and the Limitations of Language Model Training | |
ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems | |
Text Retrieval with Multi-Stage Re-Ranking Models | |
LLatrieval: LLM-Verified Retrieval for Verifiable Generation | |
CoverBench: A Challenging Benchmark for Complex Claim Verification | |
Knowledge-Augmented Large Language Models for Personalized Contextual Query Suggestion | |
Exploring Fine-tuning ChatGPT for News Recommendation | |
Self-Taught Evaluators | |
RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation | |
The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models | |
Mixture of Experts with Mixture of Precisions for Tuning Quality of Service | |
The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models | |
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization | |
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining | |
MooER: LLM-based Speech Recognition and Translation Models from Moore Threads | |
Language Model Can Listen While Speaking | |
Unleashing the Power of Data Tsunami: A Comprehensive Survey on Data Assessment and Selection for Instruction Tuning of Language Models | |
Mini-Monkey: Alleviate the Sawtooth Effect by Multi-Scale Adaptive Cropping | |
MiniCPM-V: A GPT-4V Level MLLM on Your Phone | |
What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models | |
The Stability-Efficiency Dilemma: Investigating Sequence Length Warmup for Training GPT Models | |
Can LLMs predict the convergence of Stochastic Gradient Descent? | |
The Impact of Hyperparameters on Large Language Model Inference Performance: An Evaluation of vLLM and HuggingFace Pipelines | |
LLaVA-OneVision: Easy Visual Task Transfer | |
The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design | |
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters | |
Fact Finder -- Enhancing Domain Expertise of Large Language Models by Incorporating Knowledge Graphs | |
A Real-Time Adaptive Multi-Stream GPU System for Online Approximate Nearest Neighborhood Search | |
Leveraging Inter-Chunk Interactions for Enhanced Retrieval in Large Language Model-Based Question Answering | |
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future | |
Generative Retrieval with Few-shot Indexing | |
Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval | |
SARATHI: Efficient LLM Inference by Piggybacking Decodes with Chunked Prefills | |
Can We Trust LLMs? Mitigate Overconfidence Bias in LLMs through Knowledge Transfer | |
StructEval: Deepen and Broaden Large Language Model Assessment via Structured Evaluation | |
Synthesizing Text-to-SQL Data from Weak and Strong LLMs | |
Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models | |
CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases | |
NACL: A General and Effective KV Cache Eviction Framework for LLMs at Inference Time | |
Comparative Analysis of Open-Source Language Models in Summarizing Medical Text Data | |
EXAONE 3.0 7.8B Instruction Tuned Language Model | |
Sketch-Guided Constrained Decoding for Boosting Blackbox Large Language Models without Logit Access | |
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models | |
Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models | |
Learning Task Decomposition to Assist Humans in Competitive Programming | |
Better Alignment with Instruction Back-and-Forth Translation | |
Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models | |
Deeploy: Enabling Energy-Efficient Deployment of Small Language Models On Heterogeneous Microcontrollers | |
LLM-DetectAIve: a Tool for Fine-Grained Machine-Generated Text Detection | |
Lifelong Personalized Low-Rank Adaptation of Large Language Models for Recommendation | |
ULLME: A Unified Framework for Large Language Model Embeddings with Generation-Augmented Learning | |
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI | |
Diffusion Guided Language Modeling | |
Conversational Prompt Engineering | |
Trans-Tokenization and Cross-lingual Vocabulary Transfers: Language Adaptation of LLMs for Low-Resource NLP | |
Pairing Clustered Inverted Indexes with kNN Graphs for Fast Approximate Retrieval over Learned Sparse Representations | |
Enhancing Robustness of Retrieval-Augmented Language Models with In-Context Learning | |
EfficientRAG: Efficient Retriever for Multi-Hop Question Answering | |
Pairwise Judgment Formulation for Semantic Embedding Model in Web Search | |
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency | |
PhiloBERTA: A Transformer-Based Cross-Lingual Analysis of Greek and Latin Lexicons | |
Interpreting Attention Layer Outputs with Sparse Autoencoders | |
Fine-tuning language models to find agreement among humans with diverse preferences | |
VITA: Towards Open-Source Interactive Omni Multimodal LLM | |
A Survey of NL2SQL with Large Language Models: Where are we, and where are we going? | |
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding | |
Rag and Roll: An End-to-End Evaluation of Indirect Prompt Manipulations in LLM-based Application Frameworks | |
Early Exit Strategies for Approximate k-NN Search in Dense Retrieval | |
HybridRAG: Integrating Knowledge Graphs and Vector Retrieval Augmented Generation for Efficient Information Extraction | |
Relevance Filtering for Embedding-based Retrieval | |
OpenResearcher: Unleashing AI for Accelerated Scientific Research | |
Enhancing Relevance of Embedding-based Retrieval at Walmart | |
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models | |
Natural Language Outlines for Code: Literate Programming in the LLM Era | |
ToolSandbox: A Stateful, Conversational, Interactive Evaluation Benchmark for LLM Tool Use Capabilities | |
Affective Computing in the Era of Large Language Models: A Survey from the NLP Perspective | |
1.5-Pints Technical Report: Pretraining in Days, Not Months -- Your Language Model Thrives on Quality Data | |
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery | |
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents | |
Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation | |
PhysBERT: A Text Embedding Model for Physics Scientific Literature | |
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment | |
Med42-v2: A Suite of Clinical LLMs | |
Your Context Is Not an Array: Unveiling Random Access Limitations in Transformers | |
PERSOMA: PERsonalized SOft ProMpt Adapter Architecture for Personalized Language Prompting | |
LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs | |
The Earth is Flat because...: Investigating LLMs' Belief towards Misinformation via Persuasive Conversation | |
Layerwise Recurrent Router for Mixture-of-Experts | |
Prompt Tuning as User Inherent Profile Inference Machine | |
Large Language Model Agent in Financial Trading: A Survey | |
Amuro & Char: Analyzing the Relationship between Pre-Training and Fine-Tuning of Large Language Models | |
Design Proteins Using Large Language Models: Enhancements and Comparative Analyses | |
Hermes 3 Technical Report | |
FuxiTranyu: A Multilingual Large Language Model Trained with Balanced Data | |
WeKnow-RAG: An Adaptive Approach for Retrieval-Augmented Generation Integrating Web Search and Knowledge Graphs | |
Seeing and Understanding: Bridging Vision with Chemical Knowledge Via ChemVLM | |
InfinityMATH: A Scalable Instruction Tuning Dataset in Programmatic Mathematical Reasoning | |
Aquila2 Technical Report | |
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents | |
Hierarchical Structured Neural Network for Retrieval | |
BMX: Entropy-weighted Similarity and Semantic-enhanced Lexical Search | |
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs | |
How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMs | |
MALADE: Orchestration of LLM-powered Agents with Retrieval Augmented Generation for Pharmacovigilance | |
Can Large Language Models Understand Symbolic Graphics Programs? | |
BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts | |
DaRec: A Disentangled Alignment Framework for Large Language Model and Recommender System | |
Mamba Retriever: Utilizing Mamba for Effective and Efficient Dense Retrieval | |
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability | |
Post-Training Sparse Attention with Double Sparsity | |
Large language models can be zero-shot anomaly detectors for time series? | |
The ShareLM Collection and Plugin: Contributing Human-Model Chats for the Benefit of the Community | |
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm | |
FuseChat: Knowledge Fusion of Chat Models | |
Does Reasoning Emerge? Examining the Probabilities of Causation in Large Language Models | |
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities | |
NL2OR: Solve Complex Operations Research Problems Using Natural Language Inputs | |
Towards Robust and Cost-Efficient Knowledge Unlearning for Large Language Models | |
Min P Sampling: Balancing Creativity and Coherence at High Temperature | |
LLM Stability: A detailed analysis with some surprises | |
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models | |
A Survey on Benchmarks of Multimodal Large Language Models | |
Where is the signal in tokenization space? | |
JPEG-LM: LLMs as Image Generators with Canonical Codec Representations | |
W-RAG: Weakly Supervised Dense Retrieval in RAG for Open-domain Question Answering | |
Cropper: Vision-Language Model for Image Cropping through In-Context Learning | |
Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering | |
Can Large Language Models Reason? A Characterization via 3-SAT | |
Large language models can consistently generate high-quality content for election disinformation operations | |
LongVILA: Scaling Long-Context Visual Language Models for Long Videos | |
Meta Knowledge for Retrieval Augmented Large Language Models | |
Authorship Attribution in the Era of LLMs: Problems, Methodologies, and Challenges | |
Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models | |
Graph Retrieval-Augmented Generation: A Survey | |
Patched MOA: optimizing inference for diverse software development tasks | |
Patched RTC: evaluating LLMs for diverse software development tasks | |
InstructCoder: Instruction Tuning Large Language Models for Code Editing | |
To Code, or Not To Code? Exploring Impact of Code in Pre-training | |
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model | |
HMoE: Heterogeneous Mixture of Experts for Language Modeling | |
Synergistic Approach for Simultaneous Optimization of Monolingual, Cross-lingual, and Multilingual Information Retrieval | |
Analysis of Plan-based Retrieval for Grounded Text Generation | |
NeCo: Improving DINOv2's spatial representations in 19 GPU hours with Patch Neighbor Consistency | |
Ferret: Faster and Effective Automated Red Teaming with Reward-Based Scoring Technique | |
Goldfish: Monolingual Language Models for 350 Languages | |
BLADE: Benchmarking Language Model Agents for Data-Driven Science | |
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering | |
Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation | |
Beneath the Surface of Consistency: Exploring Cross-lingual Knowledge Representation Sharing in LLMs | |
See What LLMs Cannot Answer: A Self-Challenge Framework for Uncovering LLM Weaknesses | |
LLM Pruning and Distillation in Practice: The Minitron Approach | |
Critique-out-Loud Reward Models | |
FocusLLM: Scaling LLM's Context by Parallel Decoding | |
First Activations Matter: Training-Free Methods for Dynamic Activation in Large Language Models | |
StructuredRAG: JSON Response Formatting with Large Language Models | |
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | |
RAGLAB: A Modular and Research-Oriented Unified Framework for Retrieval-Augmented Generation | |
UniFashion: A Unified Vision-Language Model for Multimodal Fashion Retrieval and Generation | |
Mistral-SPLADE: LLMs for for better Learned Sparse Retrieval | |
CTP-LLM: Clinical Trial Phase Transition Prediction Using Large Language Models | |
Backward-Compatible Aligned Representations via an Orthogonal Transformation Layer | |
Great Memory, Shallow Reasoning: Limits of $k$NN-LMs | |
Unboxing Occupational Bias: Grounded Debiasing LLMs with U.S. Labor Data | |
Flexora: Flexible Low Rank Adaptation for Large Language Models | |
Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information | |
Controllable Text Generation for Large Language Models: A Survey | |
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation | |
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications | |
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs | |
Drama Engine: A Framework for Narrative Agents | |
Strategist: Learning Strategic Skills by LLMs via Bi-Level Tree Search | |
Large Language Models as Foundations for Next-Gen Dense Retrieval: A Comprehensive Empirical Assessment | |
Vintern-1B: An Efficient Multimodal Large Language Model for Vietnamese | |
SPARK: Multi-Vision Sensor Perception and Reasoning Benchmark for Large-scale Vision-Language Models | |
ConflictBank: A Benchmark for Evaluating the Influence of Knowledge Conflicts in LLM | |
Automating Thought of Search: A Journey Towards Soundness and Completeness | |
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation | |
Evidence-backed Fact Checking using RAG and Few-Shot In-Context Learning with LLMs | |
Matmul or No Matmal in the Era of 1-bit LLMs | |
Cross-Modal Safety Alignment: Is textual unlearning all you need? | |
Unlocking the Potential of Large Language Models for Clinical Text Anonymization: A Comparative Study | |
Conveyor: Efficient Tool-aware LLM Serving with Tool Partial Execution | |
QUB-Cirdan at "Discharge Me!": Zero shot discharge letter generation by open-source LLM | |
Exploring Backdoor Attacks against Large Language Model-based Decision Making | |
Phantom: General Trigger Attacks on Retrieval Augmented Language Generation | |
Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters | |
Visual Perception by Large Language Model's Weights | |
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs | |
Towards Hierarchical Multi-Agent Workflows for Zero-Shot Prompt Optimization | |
PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization | |
Nadine: An LLM-driven Intelligent Social Robot with Affective Capabilities and Human-like Memory | |
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning CodeLLMs | |
InstructionCP: A fast approach to transfer Large Language Models into target language | |
KNOW: A Real-World Ontology for Knowledge Capture with Large Language Models | |
InterPreT: Interactive Predicate Learning from Language Feedback for Generalizable Task Planning | |
Two Optimizers Are Better Than One: LLM Catalyst Empowers Gradient-Based Optimization for Prompt Tuning | |
One-Shot Safety Alignment for Large Language Models via Optimal Dualization | |
Are Large Language Models Chameleons? | |
Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding | |
Towards Next-Generation Urban Decision Support Systems through AI-Powered Construction of Scientific Ontology using Large Language Models -- A Case in Optimizing Intermodal Freight Transportation | |
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos | |
Learning from Litigation: Graphs and LLMs for Retrieval and Reasoning in eDiscovery | |
Can Graph Learning Improve Task Planning? | |
MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors | |
Towards Faithful Chain-of-Thought: Large Language Models are Bridging Reasoners | |
Language Generation with Strictly Proper Scoring Rules | |
Compressing Large Language Models using Low Rank and Low Precision Decomposition | |
Video Enriched Retrieval Augmented Generation Using Aligned Video Captions | |
Mechanistic Interpretability of Binary and Ternary Transformers | |
Enhanced Robot Arm at the Edge with NLP and Vision Systems | |
Generative Query Reformulation Using Ensemble Prompting, Document Fusion, and Relevance Feedback | |
HEART-felt Narratives: Tracing Empathy and Narrative Style in Personal Stories with LLMs | |
Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | |
THREAD: Thinking Deeper with Recursive Spawning | |
Match, Compare, or Select? An Investigation of Large Language Models for Entity Matching | |
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning Attention | |
LLM-Assisted Static Analysis for Detecting Security Vulnerabilities | |
CLAQ: Pushing the Limits of Low-Bit Post-Training Quantization for LLMs | |
Autoformalizing Euclidean Geometry | |
LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding | |
Tokenization Matters! Degrading Large Language Models through Challenging Their Tokenization | |
MotionLLM: Multimodal Motion-Language Learning with Large Language Models | |
Exploring the LLM Journey from Cognition to Expression with Linear Representations | |
A Large Language Model-based multi-agent manufacturing system for intelligent shopfloor | |
TIE: Revolutionizing Text-based Image Editing for Complex-Prompt Following and High-Fidelity Editing | |
Laurel: Generating Dafny Assertions Using Large Language Models | |
LLMs for User Interest Exploration in Large-scale Recommendation Systems | |
Devil's Advocate: Anticipatory Reflection for LLM Agents | |
SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs | |
Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models | |
Mechanism Design for LLM Fine-tuning with Multiple Reward Models | |
FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference | |
A statistical framework for weak-to-strong generalization | |
No Two Devils Alike: Unveiling Distinct Mechanisms of Fine-tuning Attacks | |
GeneAgent: Self-verification Language Agent for Gene Set Knowledge Discovery using Domain Databases | |
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection | |
C3LLM: Conditional Multimodal Content Generation Using Large Language Models | |
Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting | |
Finetuning Large Language Model for Personalized Ranking | |
Towards Completeness-Oriented Tool Retrieval for Large Language Models | |
Keypoint-based Progressive Chain-of-Thought Distillation for LLMs | |
SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models | |
Semantic Importance-Aware Communications with Semantic Correction Using Large Language Models | |
Claim Verification in the Age of Large Language Models: A Survey | |
Streaming Long Video Understanding with Large Language Models | |
Your Large Language Models Are Leaving Fingerprints | |
WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response | |
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions | |
Why Not Transform Chat Large Language Models to Non-English? | |
TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment | |
LOGIN: A Large Language Model Consulted Graph Neural Network Training Framework | |
Sunnie: An Anthropomorphic LLM-Based Conversational Agent for Mental Well-Being Activity Recommendation | |
CG-FedLLM: How to Compress Gradients in Federated Fune-tuning for Large Language Models | |
DSTI at LLMs4OL 2024 Task A: Intrinsic versus extrinsic knowledge for type classification | |
How to set AdamW's weight decay as you scale model and dataset size | |
Safety Alignment for Vision Language Models | |
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation | |
Large Language Models are Effective Priors for Causal Graph Discovery | |
HighwayLLM: Decision-Making and Navigation in Highway Driving with RL-Informed Language Model | |
WaterPool: A Watermark Mitigating Trade-offs among Imperceptibility, Efficacy and Robustness | |
LIRE: listwise reward enhancement for preference alignment | |
Disperse-Then-Merge: Pushing the Limits of Instruction Tuning via Alignment Tax Reduction | |
TrojanRAG: Retrieval-Augmented Generation Can Be Backdoor Driver in Large Language Models | |
RoundTable: Leveraging Dynamic Schema and Contextual Autocomplete for Enhanced Query Precision in Tabular Question Answering | |
VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding | |
Lusifer: LLM-based User SImulated Feedback Environment for online Recommender systems | |
AdpQ: A Zero-shot Calibration Free Adaptive Post Training Quantization Method for LLMs | |
Large Language Models (LLMs) Assisted Wireless Network Deployment in Urban Settings | |
Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance | |
Towards Evaluating and Building Versatile Large Language Models for Medicine | |
LLMs are not Zero-Shot Reasoners for Biomedical Information Extraction | |
RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment | |
Pruning By Explaining Revisited: Optimizing Attribution Methods to Prune CNNs and Transformers | |
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model | |
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans? | |
Domain-specific long text classification from sparse relevant information | |
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation | |
Instruct-DeBERTa: A Hybrid Approach for Aspect-based Sentiment Analysis on Textual Reviews | |
Insights from Benchmarking Frontier Language Models on Web App Code Generation | |
Can LLM be a Good Path Planner based on Prompt Engineering? Mitigating the Hallucination for Path Planning | |
Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates | |
Semantic Alignment for Multimodal Large Language Models | |
Memory-Efficient LLM Training with Online Subspace Descent | |
A Survey of Hallucination in Large Foundation Models | |
MEDCO: Medical Education Copilots Based on A Multi-Agent Framework | |
Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | |
Towards Realistic Synthetic User-Generated Content: A Scaffolding Approach to Generating Online Discussions | |
SWE-bench-java: A GitHub Issue Resolving Benchmark for Java | |
The Mamba in the Llama: Distilling and Accelerating Hybrid Models | |
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning | |
MobileQuant: Mobile-friendly Quantization for On-device Language Models | |
LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs | |
LLaVaOLMoBitnet1B: Ternary LLM goes Multimodal! | |
Efficient Detection of Toxic Prompts in Large Language Models | |
Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler | |
Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time | |
A Web-Based Solution for Federated Learning with LLM-Based Automation | |
NanoFlow: Towards Optimal Large Language Model Serving Throughput | |
A Survey of Large Language Models for European Languages | |
HiRED: Attention-Guided Token Dropping for Efficient Inference of High-Resolution Vision-Language Models in Resource-Constrained Environments | |
Challenges and Responses in the Practice of Large Language Models | |
PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars | |
Inverse Scaling: When Bigger Isn't Better | |
Generative Verifiers: Reward Modeling as Next-Token Prediction | |
Project SHADOW: Symbolic Higher-order Associative Deductive reasoning On Wikidata using LM probing | |
BaichuanSEED: Sharing the Potential of ExtensivE Data Collection and Deduplication by Introducing a Competitive Large Language Model Baseline | |
DocLayLLM: An Efficient and Effective Multi-modal Extension of Large Language Models for Text-rich Document Understanding | |
MRSE: An Efficient Multi-modality Retrieval System for Large Scale E-commerce | |
Writing in the Margins: Better Inference Pattern for Long Context Retrieval | |
Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning | |
PAT: Pruning-Aware Tuning for Large Language Models | |
Text2SQL is Not Enough: Unifying AI and Databases with TAG | |
Into the Unknown Unknowns: Engaged Human Learning through Participation in Language Model Agent Conversations | |
Smart Multi-Modal Search: Contextual Sparse and Dense Embedding Integration in Adobe Express | |
Agentic Retrieval-Augmented Generation for Time Series Analysis | |
LLM-3D Print: Large Language Models To Monitor and Control 3D Printing | |
A Law of Next-Token Prediction in Large Language Models | |
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders | |
WebPilot: A Versatile and Autonomous Multi-Agent System for Web Task Execution with Strategic Exploration | |
Efficient LLM Scheduling by Learning to Rank | |
Leveraging Open Knowledge for Advancing Task Expertise in Large Language Models | |
Decentralized LLM Inference over Edge Networks with Energy Harvesting | |
LLM-Based Multi-Hop Question Answering with Knowledge Graph Integration in Evolving Environments | |
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | |
Knowledge Navigator: LLM-guided Browsing Framework for Exploratory Search in Scientific Literature | |
Geometry of Lightning Self-Attention: Identifiability and Dimension | |
LogicGame: Benchmarking Rule-Based Reasoning Abilities of Large Language Models | |
Conan-embedding: General Text Embedding with More and Better Negative Samples | |
Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models | |
ReMamba: Equip Mamba with Effective Long-Sequence Modeling | |
Awes, Laws, and Flaws From Today's LLM Research | |
Persuasion Games using Large Language Models | |
Can Unconfident LLM Annotations Be Used for Confident Conclusions? | |
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling | |
Jina-ColBERT-v2: A General-Purpose Multilingual Late Interaction Retriever | |
Transformers Meet ACT-R: Repeat-Aware and Sequential Listening Session Recommendation | |
A Survey on Evaluating Large Language Models in Code Generation Tasks | |
Law of Vision Representation in MLLMs | |
SynDL: A Large-Scale Synthetic Test Collection | |
Rethinking Sparse Lexical Representations for Image Retrieval in the Age of Rising Multi-Modal Large Language Models | |
StyleRemix: Interpretable Authorship Obfuscation via Distillation and Perturbation of Style Elements | |
Understanding the User: An Intent-Based Ranking Dataset | |
Iterative Graph Alignment | |
Icing on the Cake: Automatic Code Summarization at Ericsson | |
Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts | |
LLMs generate structurally realistic social networks but overestimate political homophily | |
Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems | |
Rethinking Tokenization: Crafting Better Tokenizers for Large Language Models | |
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation | |
Multi-Party Chat: Conversational Agents in Group Settings with Humans and Models | |
MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents | |
LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation | |
GIFT-SW: Gaussian noise Injected Fine-Tuning of Salient Weights for LLMs | |
InkubaLM: A small language model for low-resource African languages | |
SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section | |
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization | |
Automatic Differential Diagnosis using Transformer-Based Multi-Label Sequence Classification | |
SciLitLLM: How to Adapt LLMs for Scientific Literature Understanding | |
AutoGen Studio: A No-Code Developer Tool for Building and Debugging Multi-Agent Systems | |
CURLoRA: Stable LLM Continual Fine-Tuning and Catastrophic Forgetting Mitigation | |
MaFeRw: Query Rewriting with Multi-Aspect Feedbacks for Retrieval-Augmented Large Language Models | |
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer | |
MemLong: Memory-Augmented Retrieval for Long Text Modeling | |
BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training | |
Cross-Modal Learning for Chemistry Property Prediction: Large Language Models Meet Graph Machine Learning | |
SYNTHEVAL: Hybrid Behavioral Testing of NLP Models with Synthetic CheckLists | |
CLOCR-C: Context Leveraging OCR Correction with Pre-trained Language Models | |
Selective Preference Optimization via Token-Level Reward Function Estimation | |
Impact of ChatGPT on the writing style of condensed matter physicists | |
WildFeedback: Aligning LLMs With In-situ User Interactions And Feedback | |
Pandora's Box or Aladdin's Lamp: A Comprehensive Analysis Revealing the Role of RAG Noise in Large Language Models | |
ImageBind-LLM: Multi-modality Instruction Tuning | |
Transformers as Support Vector Machines | |
LLM-GAN: Construct Generative Adversarial Network Through Large Language Models For Explainable Fake News Detection | |
RACONTEUR: A Knowledgeable, Insightful, and Portable LLM-Powered Shell Command Explainer | |
OLMoE: Open Mixture-of-Experts Language Models | |
BEAVER: An Enterprise Benchmark for Text-to-SQL | |
Foundations of Large Language Model Compression -- Part 1: Weight Quantization | |
Contemporary Model Compression on Large Language Models Inference | |
rerankers: A Lightweight Python Library to Unify Ranking Methods | |
FuzzCoder: Byte-level Fuzzing Test via Large Language Model | |
LUK: Empowering Log Understanding with Expert Knowledge from Large Language Models | |
Focus Agent: LLM-Powered Virtual Focus Group | |
A Fresh Take on Stale Embeddings: Improving Dense Retriever Training with Corrector Networks | |
AgentRE: An Agent-Based Framework for Navigating Complex Information Landscapes in Relation Extraction | |
In Defense of RAG in the Era of Long-Context Language Models | |
Laser: Parameter-Efficient LLM Bi-Tuning for Sequential Recommendation with Collaborative Information | |
LongRecipe: Recipe for Efficient Long Context Generalization in Large Languge Models | |
ProGRes: Prompted Generative Rescoring on ASR n-Best | |
Augmented Reality without Borders: Achieving Precise Localization Without Maps | |
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming | |
CogVLM2: Visual Language Models for Image and Video Understanding | |
Mamba or Transformer for Time Series Forecasting? Mixture of Universals (MoU) Is All You Need | |
In-Context Imitation Learning via Next-Token Prediction | |
A Practitioner's Guide to Continual Multimodal Pretraining | |
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA | |
LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture | |
Configurable Foundation Models: Building LLMs from a Modular Perspective | |
Towards a Unified View of Preference Learning for Large Language Models: A Survey | |
A Comparative Study of Pre-training and Self-training | |
Pooling And Attention: What Are Effective Designs For LLm-Based Embedding Models? | |
RouterRetriever: Exploring the Benefits of Routing over Multiple Expert Embedding Models | |
Diversify-verify-adapt: Efficient and Robust Retrieval-Augmented Ambiguous Question Answering | |
NUDGE: Lightweight Non-Parametric Fine-Tuning of Embeddings for Retrieval | |
WildVis: Open Source Visualizer for Million-Scale Chat Logs in the Wild | |
Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining | |
Unforgettable Generalization in Language Models | |
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation | |
GenAgent: Build Collaborative AI Systems with Automated Workflow Generation -- Case Studies on ComfyUI | |
Imitating Language via Scalable Inverse Reinforcement Learning | |
Statically Contextualizing Large Language Models with Typed Holes | |
ContextCite: Attributing Model Generation to Context | |
TinyAgent: Function Calling at the Edge | |
The MERIT Dataset: Modelling and Efficiently Rendering Interpretable Transcripts | |
PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action | |
Ruri: Japanese General Text Embeddings | |
On-Device Language Models: A Comprehensive Review | |
Political DEBATE: Efficient Zero-shot and Few-shot Classifiers for Political Text | |
Cog-GA: A Large Language Models-based Generative Agent for Vision-Language Navigation in Continuous Environments | |
Building Math Agents with Multi-Turn Iterative Preference Learning | |
Large Language Models and Cognitive Science: A Comprehensive Review of Similarities, Differences, and Challenges | |
Attention Heads of Large Language Models: A Survey | |
Planning In Natural Language Improves LLM Search For Code Generation | |
On the Limited Generalization Capability of the Implicit Reward Model Induced by Direct Preference Optimization | |
From MOOC to MAIC: Reshaping Online Teaching and Learning through LLM-driven Agents | |
Extracting Paragraphs from LLM Token Activations | |
xLAM: A Family of Large Action Models to Empower AI Agent Systems | |
Large Language Model-Based Agents for Software Engineering: A Survey | |
SmileyLlama: Modifying Large Language Models for Directed Chemical Space Exploration | |
Report Cards: Qualitative Evaluation of Language Models Using Natural Language Summaries | |
CHAMP: A Competition-level Dataset for Fine-Grained Analyses of LLMs' Mathematical Reasoning Capabilities | |
Evolution of Social Norms in LLM Agents using Natural Language | |
A Static Evaluation of Code Completion by Large Language Models | |
Universal Transformers | |
Hardware Acceleration of LLMs: A comprehensive survey and comparison | |
Scaling Laws for Economic Productivity: Experimental Evidence in LLM-Assisted Translation | |
The Compressor-Retriever Architecture for Language Model OS | |
A Learnable Agent Collaboration Network Framework for Personalized Multimodal AI Search Engine | |
A Survey for Large Language Models in Biomedicine | |
Watermarking Techniques for Large Language Models: A Survey | |
Genetic Approach to Mitigate Hallucination in Generative IR | |
Theory, Analysis, and Best Practices for Sigmoid Self-Attention | |
VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation | |
RLPF: Reinforcement Learning from Prediction Feedback for User Summarization with LLMs | |
Learning vs Retrieval: The Role of In-Context Examples in Regression with LLMs | |
Advancing Automated Knowledge Transfer in Evolutionary Multitasking via Large Language Models | |
An overview of domain-specific foundation model: key technologies, applications and challenges | |
Flexible and Effective Mixing of Large Language Models into a Mixture of Domain Experts | |
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding | |
RETAIN: Interactive Tool for Regression Testing Guided LLM Migration | |
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data | |
MoRe Fine-Tuning with 10x Fewer Parameters | |
Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity | |
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers | |
AnyMatch -- Efficient Zero-Shot Entity Matching with a Small Language Model | |
Spinning the Golden Thread: Benchmarking Long-Form Generation in Language Models | |
MMEvol: Empowering Multimodal Large Language Models with Evol-Instruct | |
TurkishMMLU: Measuring Massive Multitask Language Understanding in Turkish | |
Benchmarking Chinese Knowledge Rectification in Large Language Models | |
A System and Benchmark for LLM-based Q\&A on Heterogeneous Data | |
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery | |
CauseJudger: Identifying the Cause with LLMs for Abductive Logical Reasoning | |
Tele-LLMs: A Series of Specialized Large Language Models for Telecommunications | |
OneGen: Efficient One-Pass Unified Generation and Retrieval for LLMs | |
Achieving Peak Performance for Large Language Models: A Systematic Review | |
Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models | |
Improving Pretraining Data Using Perplexity Correlations | |
LLMs Will Always Hallucinate, and We Need to Live With This | |
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance | |
How Does Code Pretraining Affect Language Model Task Performance? | |
Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation | |
A Comprehensive Survey of LLM Alignment Techniques: RLHF, RLAIF, PPO, DPO and More | |
Radiology-Llama2: Best-in-Class Large Language Model for Radiology | |
Synthetic continued pretraining | |
Agent Workflow Memory | |
Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering | |
STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM | |
What is the Role of Small Models in the LLM Era: A Survey | |
LLaMA-Omni: Seamless Speech Interaction with Large Language Models | |
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering | |
Operational Advice for Dense and Sparse Retrievers: HNSW, Flat, or Inverted Indexes? | |
Length Desensitization in Directed Preference Optimization | |
STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning | |
Can Large Language Models Unlock Novel Scientific Research Ideas? | |
SongCreator: Lyrics-based Universal Song Generation | |
Self-Harmonized Chain of Thought | |
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories | |
AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge | |
MEDIC: Towards a Comprehensive Framework for Evaluating LLMs in Clinical Applications | |
PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation | |
Generative Hierarchical Materials Search | |
Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources | |
What Makes a Maze Look Like a Maze? | |
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? | |
Retrieval Augmented Thought Process for Private Data Handling in Healthcare | |
Dense Reward for Free in Reinforcement Learning from Human Feedback | |
Enhancing Q&A Text Retrieval with Ranking Models: Benchmarking, fine-tuning and deploying Rerankers for RAG | |
Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models | |
Reinforcement Learning from Reflective Feedback (RLRF): Aligning and Improving LLMs via Fine-Grained Self-Reflection | |
Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT | |
Representation Tuning | |
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning | |
DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models | |
Alleviating Hallucinations in Large Language Models with Scepticism Modeling | |
SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning | |
Harmonic Reasoning in Large Language Models | |
STLM Engineering Report: Dropout | |
Towards Automated Machine Learning Research | |
Optimization Hyper-parameter Laws for Large Language Models | |
Residual Stream Analysis with Multi-Layer SAEs | |
LAST: Language Model Aware Speech Tokenization | |
A Fused Large Language Model for Predicting Startup Success | |
Attend First, Consolidate Later: On the Importance of Attention in Different LLM Layers | |
Accelerating Large Language Model Training with Hybrid GPU-based Compression | |
Training on the Benchmark Is Not All You Need | |
From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning | |
LanguaShrink: Reducing Token Overhead with Psycholinguistics | |
EPO: Hierarchical LLM Agents with Environment Preference Optimization | |
Atari-GPT: Investigating the Capabilities of Multimodal Large Language Models as Low-Level Policies for Atari Games | |
Harmonized Speculative Sampling | |
Why transformers are obviously good models of language | |
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models | |
How transformers learn structured data: insights from hierarchical filtering | |
Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference Data | |
SLM Meets LLM: Balancing Latency, Interpretability and Consistency in Hallucination Detection | |
Search-Based LLMs for Code Optimization | |
Memorization In In-Context Learning | |
Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? | |
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference | |
Demystifying the Communication Characteristics for Distributed Transformer Models | |
In-Context Learning with Representations: Contextual Generalization of Trained Transformers | |
Performance Law of Large Language Models | |
Importance Weighting Can Help Large Language Models Self-Improve | |
Acquiring Bidirectionality via Large and Small Language Models | |
Extracting Sentence Embeddings from Pretrained Transformer Models | |
Instruct Large Language Models to Generate Scientific Literature Survey Step by Step | |
LLMs can Schedule | |
A Unified Framework for Model Editing | |
AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies | |
Introducing the NewsPaLM MBR and QE Dataset: LLM-Generated High-Quality Parallel Data Outperforms Traditional Web-Crawled Data | |
Animate, or Inanimate, That is the Question for Large Language Models | |
Generalisation First, Memorisation Second? Memorisation Localisation for Natural Language Classification Tasks | |
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression | |
Partial Experts Checkpoint: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training | |
From Words to Worth: Newborn Article Impact Prediction with LLM | |
Is Child-Directed Speech Effective Training Data for Language Models? | |
Automated Theorem Provers Help Improve Large Language Model Reasoning | |
SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models | |
Decoupled Vocabulary Learning Enables Zero-Shot Translation from Unseen Languages | |
Cross-layer Attention Sharing for Large Language Models | |
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs | |
Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer | |
Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models | |
On the Resilience of Multi-Agent Systems with Malicious Agents | |
Disentangling Dense Embeddings with Sparse Autoencoders | |
SentenceVAE: Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context | |
PMoE: Progressive Mixture of Experts with Asymmetric Transformer for Continual Learning | |
Adaptive Pre-training Data Detection for Large Language Models via Surprising Tokens | |
Entropy, Thermodynamics and the Geometrization of the Language Model | |
MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning | |
CultureVo: The Serious Game of Utilizing Gen AI for Enhancing Cultural Intelligence | |
ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 | |
LLMs' Understanding of Natural Language Revealed | |
Mixture of Modular Experts: Distilling Knowledge from a Multilingual Teacher into Specialized Modular Language Models | |
Do Language Models Have a Critical Period for Language Acquisition? | |
Understanding Memorisation in LLMs: Dynamics, Influencing Factors, and Implications | |
Towards Effective and Efficient Continual Pre-training of Large Language Models | |
Climbing the Complexity Ladder with Expressive Attention | |
Towards More Accurate Prediction of Human Empathy and Emotion in Text and Multi-turn Conversations by Combining Advanced NLP, Transformers-based Networks, and Linguistic Methodologies | |
I Could've Asked That: Reformulating Unanswerable Questions | |
Modular Sentence Encoders: Separating Language Specialization from Cross-Lingual Alignment | |
On the Design and Analysis of LLM-Based Algorithms | |
Latent Causal Probing: A Formal Perspective on Probing with Causal Models of Data | |
A mathematical framework of intelligence and consciousness based on Riemannian Geometry | |
Enhancing Training Efficiency Using Packing with Flash Attention | |
Banishing LLM Hallucinations Requires Rethinking Generalization | |
OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser | |
Multi-Meta-RAG: Improving RAG for Multi-Hop Queries using Database Filtering with LLM-Extracted Metadata | |
A Notion of Complexity for Theory of Mind via Discrete World Models | |
Tree Cross Attention | |
Sentence Bottleneck Autoencoders from Transformer Language Models | |
Neural Machine Translation without Embeddings | |
Agents in Software Engineering: Survey, Landscape, and Vision | |
Emerging Reliance Behaviors in Human-AI Text Generation: Hallucinations, Data Quality Assessment, and Cognitive Forcing Functions | |
Programming Refusal with Conditional Activation Steering | |
AIPO: Improving Training Objective for Iterative Preference Optimization | |
Your Weak LLM is Secretly a Strong Teacher for Alignment | |
Mutual Theory of Mind in Human-AI Collaboration: An Empirical Study with LLM-driven AI Agents in a Real-time Shared Workspace Task | |
Fusing Dynamics Equation: A Social Opinions Prediction Algorithm with LLM-based Agents | |
CPL: Critical Planning Step Learning Boosts LLM Generalization in Reasoning Tasks | |
LLM Critics Help Catch LLM Bugs | |
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning | |
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models | |
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training | |
Reasoning with Language Model is Planning with World Model | |
RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval | |
Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models | |
Causal Language Modeling Can Elicit Search and Reasoning Capabilities on Logic Puzzles | |
ReCLAP: Improving Zero Shot Audio Classification by Describing Sounds | |
BERT Rediscovers the Classical NLP Pipeline | |
AI-LieDar: Examine the Trade-off Between Utility and Truthfulness in LLM Agents | |
Assessing Adversarial Robustness of Large Language Models: An Empirical Study | |
Sequential Monte Carlo Steering of Large Language Models using Probabilistic Programs | |
LLM as BT-Planner: Leveraging LLMs for Behavior Tree Generation in Robot Task Planning | |
Instigating Cooperation among LLM Agents Using Adaptive Information Modulation | |
Large Language Model Enhanced Hard Sample Identification for Denoising Recommendation | |
beeFormer: Bridging the Gap Between Semantic and Interaction Similarity in Recommender Systems | |
ComplexCodeEval: A Benchmark for Evaluating Large Code Models on More Complex Code | |
Cognitive Kernel: An Open-source Agent System towards Generalist Autopilots | |
From Text to Emoji: How PEFT-Driven Personality Manipulation Unleashes the Emoji Potential in LLMs | |
jina-embeddings-v3: Multilingual Embeddings With Task LoRA | |
Trustworthiness in Retrieval-Augmented Generation Systems: A Survey | |
On the Diagram of Thought | |
CROSS-JEM: Accurate and Efficient Cross-encoders for Short-text Ranking Tasks | |
Unleash LLMs Potential for Recommendation by Coordinating Twin-Tower Dynamic Semantic Token Generator | |
HyPA-RAG: A Hybrid Parameter Adaptive Retrieval-Augmented Generation System for AI Legal and Policy Applications | |
Expediting and Elevating Large Language Model Reasoning via Hidden Chain-of-Thought Decoding | |
Explaining Datasets in Words: Statistical Models with Natural Language Parameters | |
AudioBERT: Audio Knowledge Augmented Language Model | |
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation | |
Ferret: Federated Full-Parameter Tuning at Scale for Large Language Models | |
Qwen2.5-Coder Technical Report | |
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning | |
A Controlled Study on Long Context Extension and Generalization in LLMs | |
GRIN: GRadient-INformed MoE | |
LLMs + Persona-Plug = Personalized LLMs | |
Human-like Affective Cognition in Foundation Models | |
Designing Interfaces for Multimodal Vector Search Applications | |
Towards Fair RAG: On the Impact of Fair Ranking in Retrieval-Augmented Generation | |
A Framework for Ranking Content Providers Using Prompt Engineering and Self-Attention Network | |
Scaling FP8 training to trillion-token LLMs | |
NVLM: Open Frontier-Class Multimodal LLMs | |
LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents | |
Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement | |
Towards Time Series Reasoning with LLMs | |
Learning Spatially-Aware Language and Audio Embedding | |
THaMES: An End-to-End Tool for Hallucination Mitigation and Evaluation in Large Language Models | |
LOLA -- An Open-Source Massively Multilingual Large Language Model | |
Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse | |
SuperCoder2.0: Technical Report on Exploring the feasibility of LLMs as Autonomous Programmer | |
Semformer: Transformer Language Models with Semantic Planning | |
Embedding Geometries of Contrastive Language-Image Pre-Training | |
Promptriever: Instruction-Trained Retrievers Can Be Prompted Like Language Models | |
A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B | |
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs | |
Playground v3: Improving Text-to-Image Alignment with Deep-Fusion Large Language Models | |
On the limits of agency in agent-based models | |
Schrodinger's Memory: Large Language Models | |
Towards Data-Centric RLHF: Simple Metrics for Preference Dataset Comparison | |
RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation | |
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing | |
LLM-Powered Grapheme-to-Phoneme Conversion: Benchmark and Case Study | |
Stable Language Model Pre-training by Reducing Embedding Variability | |
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems | |
The Expressive Power of Transformers with Chain of Thought | |
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution | |
MURI: High-Quality Instruction Tuning Datasets for Low-Resource Languages via Reverse Instructions | |
Revealing the Challenge of Detecting Character Knowledge Errors in LLM Role-Playing | |
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation | |
Training Language Models to Self-Correct via Reinforcement Learning | |
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization | |
Enhancing E-commerce Product Title Translation with Retrieval-Augmented Generation and Large Language Models | |
Language Models Learn to Mislead Humans via RLHF | |
Assessing the Zero-Shot Capabilities of LLMs for Action Evaluation in RL | |
MEXMA: Token-level objectives improve sentence representations | |
Text2Traj2Text: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement Trajectories | |
Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries | |
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning | |
MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning | |
BERT-VBD: Vietnamese Multi-Document Summarization Framework | |
Measuring Human and AI Values based on Generative Psychometrics with Large Language Models | |
RoMath: A Mathematical Reasoning Benchmark in Romanian | |
Compressing LLMs: The Truth is Rarely Pure and Never Simple | |
CLAIR-A: Leveraging Large Language Models to Judge Audio Captions | |
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines | |
Knowledge-Based Domain-Oriented Data Augmentation for Enhancing Unsupervised Sentence Embedding | |
HLLM: Enhancing Sequential Recommendations via Hierarchical Large Language Models for Item and User Modeling | |
AI Suggestions Homogenize Writing Toward Western Styles and Diminish Cultural Nuances | |
Retrieval-Augmented Test Generation: How Far Are We? | |
Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning | |
RAD-Bench: Evaluating Large Language Models Capabilities in Retrieval Augmented Dialogues | |
Should RAG Chatbots Forget Unimportant Conversations? Exploring Importance and Forgetting with Psychological Insights | |
Linear Recency Bias During Training Improves Transformers' Fit to Reading Times | |
Linguistic Minimal Pairs Elicit Linguistic Similarity in Large Language Models | |
Making Large Language Models into World Models with Precondition and Effect Knowledge | |
Linguini: A benchmark for language-agnostic linguistic reasoning | |
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement | |
Dual-Layer Training and Decoding of Large Language Model with Simultaneously Thinking and Speaking | |
SLIMER-IT: Zero-Shot NER on Italian Language | |
Self-Evolutionary Large Language Models through Uncertainty-Enhanced Preference Optimization | |
Adaptive Large Language Models By Layerwise Attention Shortcuts | |
Rediscovering the Latent Dimensions of Personality with Large Language Models as Trait Descriptors | |
MindScape Study: Integrating LLM and Behavioral Sensing for Personalized AI-Driven Journaling Experiences | |
Language Models "Grok" to Copy | |
Autoregressive + Chain of Thought $\simeq$ Recurrent: Recurrence's Role in Language Models' Computability and a Revisit of Recurrent Transformer | |
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy? | |
What You Say = What You Want? Teaching Humans to Articulate Requirements for LLMs | |
When Context Leads but Parametric Memory Follows in Large Language Models | |
SELF-[IN]CORRECT: LLMs Struggle with Discriminating Self-Generated Responses | |
Mixture of Diverse Size Experts | |
Improving LLM Reasoning with Multi-Agent Tree-of-Thought Validator Agent | |
Semi-Supervised Reward Modeling via Iterative Self-Training | |
Spectral Filters, Dark Signals, and Attention Sinks | |
Unveiling Induction Heads: Provable Training Dynamics and Feature Learning in Transformers | |
Enhancing Fault Localization Through Ordered Code Analysis with LLM Agents and Self-Reflection | |
ChainBuddy: An AI Agent System for Generating LLM Pipelines | |
ShizishanGPT: An Agricultural Large Language Model Integrating Tools and Resources | |
Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models | |
RLHFuse: Efficient RLHF Training for Large Language Models with Inter- and Intra-Stage Fusion | |
RRM: Robust Reward Model Training Mitigates Reward Hacking | |
AutoVerus: Automated Proof Generation for Rust Code | |
LLM Surgery: Efficient Knowledge Unlearning and Editing in Large Language Models | |
Contextual Compression in Retrieval-Augmented Generation for Large Language Models: A Survey | |
LLMs Still Can't Plan; Can LRMs? A Preliminary Evaluation of OpenAI's o1 on PlanBench | |
Minstrel: Structural Prompt Generation with Multi-Agents Coordination for Non-AI Experts | |
TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning | |
Jailbreaking Large Language Models with Symbolic Mathematics | |
Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments | |
An adapted large language model facilitates multiple medical tasks in diabetes care | |
KTO: Model Alignment as Prospect Theoretic Optimization | |
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs | |
Towards Understanding Grokking: An Effective Theory of Representation Learning | |
What Makes Good In-Context Examples for GPT-$3$? | |
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor? | |
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping | |
Learning from Contrastive Prompts: Automated Optimization and Adaptation | |
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond | |
Beyond Fine-tuning: Unleashing the Potential of Continuous Pretraining for Clinical LLMs | |
Phantom of Latent for Large Language and Vision Models | |
Target-Aware Language Modeling via Granular Data Sampling | |
Reducing the Footprint of Multi-Vector Retrieval with Minimal Performance Impact via Token Pooling | |
A Case Study of Web App Coding with OpenAI Reasoning Models | |
DiffEditor: Enhancing Speech Editing with Semantic Enrichment and Acoustic Consistency | |
Robust Training Objectives Improve Embedding-based Retrieval in Industrial Recommendation Systems | |
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking | |
LLM-Assisted Visual Analytics: Opportunities and Challenges | |
Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling | |
Instruction Following without Instruction Tuning | |
OmniBench: Towards The Future of Universal Omni-Language Models | |
Retrieval Augmented Generation (RAG) and Beyond: A Comprehensive Survey on How to Make your LLMs use External Data More Wisely | |
A Survey on the Honesty of Large Language Models | |
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models | |
Merging LoRAs like Playing LEGO: Pushing the Modularity of LoRA to Extremes Through Rank-Wise Clustering | |
MOSS: Enabling Code-Driven Evolution and Context Management for AI Agents | |
Making Text Embedders Few-Shot Learners | |
BioAgents: Democratizing Bioinformatics Analysis with Multi-Agent Systems | |
Lighter And Better: Towards Flexible Context Adaptation For Retrieval Augmented Generation | |
Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA | |
EuroLLM: Multilingual Language Models for Europe | |
Small Language Models: Survey, Measurements, and Insights | |
Reward-Robust RLHF in LLMs | |
Planning in the Dark: LLM-Symbolic Planning Pipeline without Experts | |
Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning | |
Block-Attention for Low-Latency RAG | |
Federated Large Language Models: Current Progress and Future Directions | |
Visual Prompting in Multimodal Large Language Models: A Survey | |
Turn Every Application into an Agent: Towards Efficient Human-Agent-Computer Interaction with API-First LLM-Based Agents | |
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale | |
VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models | |
Adaptive Self-Supervised Learning Strategies for Dynamic On-Device LLM Personalization | |
DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling | |
Tell Me What You Don't Know: Enhancing Refusal Capabilities of Role-Playing Agents via Representation Space Analysis and Editing | |
Unsupervised Text Representation Learning via Instruction-Tuning for Zero-Shot Dense Retrieval | |
LLaMa-SciQ: An Educational Chatbot for Answering Science MCQ | |
Context-Enhanced LLM-Based Framework for Automatic Test Refactoring | |
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks | |
RoleBreak: Character Hallucination as a Jailbreak Attack in Role-Playing Systems | |
A Survey of Low-bit Large Language Models: Basics, Systems, and Algorithms | |
Disentangling Questions from Query Generation for Task-Adaptive Retrieval | |
Boosting Healthcare LLMs Through Retrieved Context | |
FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression | |
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference | |
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale | |
INT-FlashAttention: Enabling Flash Attention for INT8 Quantization | |
NoTeeline: Supporting Real-Time Notetaking from Keypoints with Large Language Models | |
A Comprehensive Survey of Bias in LLMs: Current Landscape and Future Directions | |
Bone: Block Affine Transformation as Parameter Efficient Fine-tuning Methods for Large Language Models | |
EgoLM: Multi-Modal Language Model of Egocentric Motions | |
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions | |
BEATS: Optimizing LLM Mathematical Capabilities with BackVerify and Adaptive Disambiguate based Efficient Tree Search | |
Efficient Arbitrary Precision Acceleration for Large Language Models on GPU Tensor Cores | |
MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models | |
Discovering the Gems in Early Layers: Accelerating Long-Context LLMs with 1000x Input Token Reduction | |
Looped Transformers for Length Generalization | |
Automatic Instruction Evolving for Large Language Models | |
Towards More Relevant Product Search Ranking Via Large Language Models: An Empirical Study | |
Well, that escalated quickly: The Single-Turn Crescendo Attack (STCA) | |
Infer Human's Intentions Before Following Natural Language Instructions | |
The Imperative of Conversation Analysis in the Era of LLMs: A Survey of Tasks, Techniques, and Trends | |
VectorSearch: Enhancing Document Retrieval with Semantic Embeddings and Optimized Search | |
ISO: Overlap of Computation and Communication within Seqenence For LLM Inference | |
Here's Charlie! Realising the Semantic Web vision of Agents in the age of LLMs | |
Multi-language Unit Test Generation using LLMs | |
CLUE: Concept-Level Uncertainty Estimation for Large Language Models | |
Hallucination Detection in LLMs: Fast and Memory-Efficient Finetuned Models | |
Alignment-Aware Model Extraction Attacks on Large Language Models | |
Creating a Gen-AI based Track and Trace Assistant MVP (SuperTracy) for PostNL | |
Deconfounded Causality-aware Parameter-Efficient Fine-Tuning for Problem-Solving Improvement of LLMs | |
Hypothesizing Missing Causal Variables with LLMs | |
Self-Instructed Derived Prompt Generation Meets In-Context Learning: Unlocking New Potential of Black-Box LLMs | |
Membership Inference Attacks Against In-Context Learning | |
Deploying a Retrieval based Response Model for Task Oriented Dialogues | |
Prompt Compression with Context-Aware Sentence Encoding for Fast and Improved LLM Inference | |
Balancing Performance and Efficiency: A Multimodal Large Language Model Pruning Method based Image Text Interaction | |
FlashFlex: Accommodating Large Language Model Training over Heterogeneous Environment | |
Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching | |
Large Language Models Can Understanding Depth from Monocular Images | |
Addition is All You Need for Energy-efficient Language Models | |
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices | |
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning | |
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models | |
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration | |
Can Models Learn Skill Composition from Examples? | |
Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code | |
Hyper-Connections | |
Visual Question Decomposition on Multimodal Large Language Models | |
DiaSynth -- Synthetic Dialogue Generation Framework | |
On the Implications of Verbose LLM Outputs: A Case Study in Translation Evaluation | |
LML: Language Model Learning a Dataset for Data-Augmented Prediction | |
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models | |
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding | |
Emu3: Next-Token Prediction is All You Need | |
Learning the Latent Rules of a Game from Data: A Chess Story | |
Cottention: Linear Transformers With Cosine Attention | |
Do We Need Domain-Specific Embedding Models? An Empirical Investigation | |
Data Analysis in the Era of Generative AI | |
Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization | |
VickreyFeedback: Cost-efficient Data Construction for Reinforcement Learning from Human Feedback | |
SciDFM: A Large Language Model with Mixture-of-Experts for Science | |
Generative Retrieval Meets Multi-Graded Relevance | |
CurricuLLM: Automatic Task Curricula Design for Learning Complex Robot Skills using Large Language Models | |
An Adversarial Perspective on Machine Unlearning for AI Safety | |
Modulated Intervention Preference Optimization (MIPO): Keep the Easy, Refine the Difficult | |
HDFlow: Enhancing LLM Complex Problem-Solving with Hybrid Thinking and Dynamic Workflows | |
MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making | |
A is for Absorption: Studying Feature Splitting and Absorption in Sparse Autoencoders | |
Natural Language Processing Methods for the Study of Protein-Ligand Interactions | |
Solving math word problems with process- and outcome-based feedback | |
Ingest-And-Ground: Dispelling Hallucinations from Continually-Pretrained LLMs with RAG | |
Law of the Weakest Link: Cross Capabilities of Large Language Models | |
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos | |
Atlas-Chat: Adapting Large Language Models for Low-Resource Moroccan Arabic Dialect | |
LoRA Dropout as a Sparsity Regularizer for Overfitting Control | |
Don't Transform the Code, Code the Transforms: Towards Precise Code Rewriting using LLMs | |
Embodied-RAG: General non-parametric Embodied Memory for Retrieval and Generation | |
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning | |
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation | |
Closed-loop Long-horizon Robotic Planning via Equilibrium Sequence Modeling | |
HelpSteer2-Preference: Complementing Ratings with Preferences | |
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging | |
Quantifying Generalization Complexity for Large Language Models | |
Not All LLM Reasoners Are Created Equal | |
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks | |
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis | |
FactAlign: Long-form Factuality Alignment of Large Language Models | |
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs | |
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding | |
BordIRlines: A Dataset for Evaluating Cross-lingual Retrieval-Augmented Generation | |
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis | |
Contrastive Localized Language-Image Pre-Training | |
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models | |
Large Language Models as Markov Chains | |
Distilling an End-to-End Voice Assistant Without Instruction Training Data | |
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation | |
General Preference Modeling with Preference Representations for Aligning Language Models | |
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? | |
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data | |
OpenMathInstruct-2: Accelerating AI for Math with Massive Open-Source Instruction Data | |
FlashMask: Efficient and Rich Mask Extension of FlashAttention | |
Unleashing the Power of Large Language Models in Zero-shot Relation Extraction via Self-Prompting | |
Understanding the Human-LLM Dynamic: A Literature Survey of LLM Use in Programming Tasks | |
KV-Compress: Paged KV-Cache Compression with Variable Compression Rates per Attention Head | |
Understanding Higher-Order Correlations Among Semantic Components in Embeddings | |
Calibrating Language Models with Adaptive Temperature Scaling | |
On the Inductive Bias of Stacking Towards Improving Reasoning | |
Training Language Models to Win Debates with Self-Play Improves Judge Accuracy | |
Intelligence at the Edge of Chaos | |
Contextual Document Embeddings | |
RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning | |
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning | |
SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics | |
Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models | |
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment | |
AutoTrain: No-code training for state-of-the-art models | |
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models | |
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning | |
The Perfect Blend: Redefining RLHF with Mixture of Judges | |
How Much Can RAG Help the Reasoning of LLM? | |
ENTP: Encoder-only Next Token Prediction | |
Logic-of-Thought: Injecting Logic into Contexts for Full Reasoning in Large Language Models | |
On The Planning Abilities of OpenAI's o1 Models: Feasibility, Optimality, and Generalizability | |
A General Framework for Producing Interpretable Semantic Text Embeddings | |
Showing LLM-Generated Code Selectively Based on Confidence of LLMs | |
Autoregressive Large Language Models are Computationally Universal | |
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise | |
Intrinsic Evaluation of RAG Systems for Deep-Logic Questions | |
Erasing Conceptual Knowledge from Language Models | |
Selective Attention Improves Transformer | |
GenSim2: Scaling Robot Data Generation with Multi-modal and Reasoning LLMs | |
ARB-LLM: Alternating Refined Binarizations for Large Language Models | |
Horizon-Length Prediction: Advancing Fill-in-the-Middle Capabilities for Code Generation with Lookahead Planning | |
In-context Learning in Presence of Spurious Correlations | |
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark | |
ReTok: Replacing Tokenizer to Enhance Representation Efficiency in Large Language Model | |
CodeMMLU: A Multi-Task Benchmark for Assessing Code Understanding Capabilities of CodeLLMs | |
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles | |
Adaptive Inference-Time Compute: LLMs Can Predict if They Can Do Better, Even Mid-Generation | |
HELMET: How to Evaluate Long-Context Language Models Effectively and Thoroughly | |
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training | |
Efficient $1$-bit tensor approximations | |
When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1 | |
Differential Transformer | |
LoTLIP: Improving Language-Image Pre-training for Long Text Understanding | |
DEPT: Decoupled Embeddings for Pre-training Language Models | |
Fast State Restoration in LLM Serving with HCache | |
TLDR: Token-Level Detective Reward Model for Large Vision Language Models | |
Reward-RAG: Enhancing RAG with Reward Driven Supervision | |
Named Clinical Entity Recognition Benchmark | |
MathHay: An Automated Benchmark for Long-Context Mathematical Reasoning in LLMs | |
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning | |
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations | |
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models | |
ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery | |
Why Do We Need Weight Decay in Modern Deep Learning? | |
SwiftKV: Fast Prefill-Optimized Inference with Knowledge-Preserving Model Transformation | |
Algorithmic Capabilities of Random Transformers | |
Inference Scaling for Long-Context Retrieval Augmented Generation | |
Preference Optimization as Probabilistic Inference | |
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models | |
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References | |
$\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization | |
LongGenBench: Long-context Generation Benchmark | |
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint? | |
nGPT: Normalized Transformer with Representation Learning on the Hypersphere | |
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks | |
ToolGen: Unified Tool Retrieval and Calling via Generation | |
MA-RLHF: Reinforcement Learning from Human Feedback with Macro Actions | |
A generative framework to bridge data-driven models and scientific theories in language neuroscience | |
Hyper-multi-step: The Truth Behind Difficult Long-context Tasks | |
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents | |
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation | |
Steering Large Language Models between Code Execution and Textual Reasoning | |
TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention | |
Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach | |
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search | |
Archon: An Architecture Search Framework for Inference-Time Techniques | |
Initialization of Large Language Models via Reparameterization to Mitigate Loss Spikes | |
Data Selection via Optimal Control for Language Models | |
Upcycling Large Language Models into Mixture of Experts | |
Temporal Reasoning Transfer from Text to Video | |
TRACE: Temporal Grounding Video LLM via Causal Event Modeling | |
MM-Ego: Towards Building Egocentric Multimodal LLMs | |
One Initialization to Rule them All: Fine-tuning via Explained Variance Adaptation | |
Can Transformers Reason Logically? A Study in SAT Solving | |
Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates | |
Personalized Visual Instruction Tuning | |
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering | |
Pixtral 12B | |
Self-Boosting Large Language Models with Synthetic Preference Data | |
Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA | |
Hallucinating AI Hijacking Attack: Large Language Models and Malicious Code Recommenders | |
Multimodal Situational Safety | |
AutoDAN-Turbo: A Lifelong Agent for Strategy Self-Exploration to Jailbreak LLMs | |
Multimodal Large Language Models for Inverse Molecular Design with Retrosynthetic Planning | |
CursorCore: Assist Programming through Aligning Anything | |
TinyEmo: Scaling down Emotional Reasoning via Metric Projection | |
MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders | |
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet | |
Falcon Mamba: The First Competitive Attention-free 7B Language Model | |
GLEE: A Unified Framework and Benchmark for Language-based Economic Environments | |
Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models | |
Does Spatial Cognition Emerge in Frontier Models? | |
Round and Round We Go! What makes Rotary Positional Encodings useful? | |
Large Language Model Enhanced Text-to-SQL Generation: A Survey | |
Tracking Universal Features Through Fine-Tuning and Model Merging | |
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG | |
Exploring the Meaningfulness of Nearest Neighbor Search in High-Dimensional Space | |
SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks | |
Response Tuning: Aligning Large Language Models without Instruction | |
Collective Critics for Creative Story Generation | |
LLM Self-Correction with DeCRIM: Decompose, Critique, and Refine for Enhanced Following of Instructions with Multiple Constraints | |
MEXA: Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignment | |
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System | |
Emergent properties with repeated examples | |
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization | |
PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers in LLMs | |
MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents | |
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code | |
Towards Self-Improvement of LLMs via MCTS: Leveraging Stepwise Knowledge with Curriculum Preference Learning | |
SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe | |
Intriguing Properties of Large Language and Vision Models | |
Benchmarking Agentic Workflow Generation | |
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models | |
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition | |
Think Twice: A Human-like Two-stage Conversational Agent for Emotional Response Generation | |
WALL-E: World Alignment by Rule Learning Improves World Model-based LLM Agents | |
Vector-ICL: In-context Learning with Continuous Vector Representations | |
Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning | |
LLM Cascade with Multi-Objective Optimal Consideration | |
No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vigilant Users | |
The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks | |
LLMs Are In-Context Reinforcement Learners | |
Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models | |
Accelerated Preference Optimization for Large Language Model Alignment | |
How to Train Long-Context Language Models (Effectively) | |
GraphIC: A Graph-Based In-Context Example Retrieval Model for Multi-Step Reasoning | |
SimpleStrat: Diversifying Language Model Generation with Stratification | |
Mentor-KD: Making Small Language Models Better Multi-step Reasoners | |
SuperCorrect: Supervising and Correcting Language Models with Error-Driven Insights | |
Science is Exploration: Computational Frontiers for Conceptual Metaphor Theory | |
Baichuan-Omni Technical Report | |
KV Prediction for Improved Time to First Token | |
Do You Know What You Are Talking About? Characterizing Query-Knowledge Relevance For Reliable Retrieval Augmented Generation | |
Can Looped Transformers Learn to Implement Multi-step Gradient Descent for In-context Learning? | |
Adam Exploits $\ell_\infty$-geometry of Loss Landscape via Coordinate-wise Adaptivity | |
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment | |
Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining | |
Benign Overfitting in Single-Head Attention | |
DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models | |
I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy | |
PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness | |
The Accuracy Paradox in RLHF: When Better Reward Models Don't Yield Better Language Models | |
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More | |
RL, but don't do anything I wouldn't do | |
From Tokens to Words: On the Inner Lexicon of LLMs | |
Neuron-Level Sequential Editing for Large Language Models | |
Mixture of Attentions For Speculative Decoding | |
Integrating Natural Language Prompting Tasks in Introductory Programming Courses | |
Benign or Not-Benign Overfitting in Token Selection of Attention Mechanism | |
Causal Inference with Large Language Model: A Survey | |
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation | |
MTU-Bench: A Multi-granularity Tool-Use Benchmark for Large Language Models | |
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI | |
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads | |
PeerArg: Argumentative Peer Review with LLMs | |
LongMemEval: Benchmarking Chat Assistants on Long-Term Interactive Memory | |
Thinking LLMs: General Instruction Following with Thought Generation | |
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts | |
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents | |
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models | |
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation | |
Rethinking Data Selection at Scale: Random Selection is Almost All You Need | |
The Same But Different: Structural Similarities and Differences in Multilingual Language Modeling | |
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models | |
Tree of Problems: Improving structured problem solving with compositionality | |
TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training | |
Think While You Generate: Discrete Diffusion with Planned Denoising | |
Strong Model Collapse | |
Fundamental Limitations on Subquadratic Alternatives to Transformers | |
On The Computational Complexity of Self-Attention | |
Primer: Searching for Efficient Transformers for Language Modeling | |
NesTools: A Dataset for Evaluating Nested Tool Learning Abilities of Large Language Models | |
Agent-as-a-Judge: Evaluate Agents with Agents | |
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free | |
Empirical Study of Mutual Reinforcement Effect and Application in Few-shot Text Classification Tasks via Prompt | |
LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models | |
What Matters in Transformers? Not All Attention is Needed | |
Proactive Agent: Shifting LLM Agents from Reactive Responses to Active Assistance | |
A Hitchhiker's Guide to Scaling Law Estimation | |
How Numerical Precision Affects Mathematical Reasoning Capabilities of LLMs | |
Survey and Evaluation of Converging Architecture in LLMs based on Footsteps of Operations | |
Agentic Information Retrieval | |
In-Context Learning Enables Robot Action Prediction in LLMs | |
Exploring Model Kinship for Merging Large Language Models | |
Insights from the Inverse: Reconstructing LLM Training Goals Through Inverse RL | |
BenTo: Benchmark Task Reduction with In-Context Transferability | |
Revealing the Barriers of Language Agents in Planning | |
ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs | |
Prompt Compression for Large Language Models: A Survey | |
Model Balancing Helps Low-data Training and Fine-tuning | |
The Moral Case for Using Language Model Agents for Recommendation | |
OMCAT: Omni Context Aware Transformer | |
FLARE: Faithful Logic-Aided Reasoning and Exploration | |
Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence | |
Persistent Topological Features in Large Language Models | |
Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions | |
Large Language Model Evaluation via Matrix Nuclear-Norm | |
ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple Domains | |
Taming Overconfidence in LLMs: Reward Calibration in RLHF | |
Parameter-Efficient Fine-Tuning of State Space Models | |
How Do Multilingual Models Remember? Investigating Multilingual Factual Recall Mechanisms | |
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements | |
DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities | |
LightRAG: Simple and Fast Retrieval-Augmented Generation | |
Large Language Model-Based Evolutionary Optimizer: Reasoning with elitism | |
$γ-$MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models | |
Can MLLMs Understand the Deep Implication Behind Chinese Images? | |
Retrospective Learning from Interactions | |
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models | |
AgentOccam: A Simple Yet Strong Baseline for LLM-Based Web Agents | |
Harnessing Webpage UIs for Text-Rich Visual Understanding | |
Looking Inward: Language Models Can Learn About Themselves by Introspection | |
PopAlign: Diversifying Contrasting Patterns for a More Comprehensive Alignment | |
Improving Multi-modal Large Language Model through Boosting Vision Capabilities | |
Persistent Pre-Training Poisoning of LLMs | |
A Comparative Study on Reasoning Patterns of OpenAI's o1 Model | |
LoLDU: Low-Rank Adaptation via Lower-Diag-Upper Decomposition for Parameter-Efficient Fine-Tuning | |
Remember, Retrieve and Generate: Understanding Infinite Visual Concepts as Your Personalized Assistant | |
Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems | |
SBI-RAG: Enhancing Math Word Problem Solving for Students through Schema-Based Instruction and Retrieval-Augmented Generation | |
Roadmap towards Superhuman Speech Understanding using Large Language Models | |
Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation | |
A Little Human Data Goes A Long Way | |
AERO: Softmax-Only LLMs for Efficient Private Inference | |
Merge to Learn: Efficiently Adding Skills to Language Models with Model Merging | |
Improving Instruction-Following in Language Models through Activation Steering | |
JudgeBench: A Benchmark for Evaluating LLM-based Judges | |
From Commands to Prompts: LLM-based Semantic File System for AIOS | |
MoH: Multi-Head Attention as Mixture-of-Head Attention | |
When Attention Sink Emerges in Language Models: An Empirical View | |
Minimum Tuning to Unlock Long Output from LLMs with High Quality Data as the Key | |
FlatQuant: Flatness Matters for LLM Quantization | |
MedMobile: A mobile-sized language model with expert-level clinical capabilities | |
Untie the Knots: An Efficient Data Augmentation Strategy for Long-Context Pre-Training in Language Models | |
Evaluating Open-Source Sparse Autoencoders on Disentangling Factual Knowledge in GPT-2 Small | |
SPEER: Sentence-Level Planning of Long Clinical Summaries via Embedded Entity Retrieval | |
SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction | |
Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs | |
TopoLM: brain-like spatio-functional organization in a topographic language model | |
Global Lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers | |
Are AI Detectors Good Enough? A Survey on Quality of Datasets With Machine-Generated Texts | |
GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings | |
Teaching Models to Balance Resisting and Accepting Persuasion | |
Do LLMs "know" internally when they follow instructions? | |
CaTs and DAGs: Integrating Directed Acyclic Graphs with Transformers and Fully-Connected Neural Networks for Causally Constrained Predictions | |
Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning | |
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models | |
Goal Inference from Open-Ended Dialog | |
A Common Pitfall of Margin-based Language Model Alignment: Gradient Entanglement | |
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs | |
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation | |
Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media | |
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces | |
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization | |
SymNoise: Advancing Language Model Fine-tuning with Symmetric Noise | |
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance | |
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution | |
Pre-training Distillation for Large Language Models: A Design Space Exploration | |
Improve Vision Language Model Chain-of-thought Reasoning | |
RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style | |
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages | |
Baichuan Alignment Technical Report | |
SemiEvol: Semi-supervised Fine-tuning for LLM Adaptation | |
Decomposing The Dark Matter of Sparse Autoencoders | |
Sparse Universal Transformer | |
Bridging the Training-Inference Gap in LLMs by Leveraging Self-Generated Tokens | |
Diverging Preferences: When do Annotators Disagree and do Models Know? | |
Do LLMs estimate uncertainty well in instruction-following? | |
Large Language Models Are Overparameterized Text Encoders | |
Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts | |
Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs | |
CBT-Bench: Evaluating Large Language Models on Assisting Cognitive Behavior Therapy | |
Generative Reward Models | |
Meta-Chunking: Learning Efficient Text Segmentation via Logical Perception | |
Content Enhanced BERT-based Text-to-SQL Generation | |
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing | |
SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments | |
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities | |
Mix Data or Merge Models? Optimizing for Diverse Multi-Task Learning | |
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment | |
Cascade Reward Sampling for Efficient Decoding-Time Alignment | |
Lemur: Log Parsing with Entropy Sampling and Chain-of-Thought Merging | |
Corpus Synthesis for Zero-shot ASR domain Adaptation using Large Language Models | |
Data Agnostic RoBERTa-based Natural Language to SQL Query Generation | |
Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation | |
Selecting Influential Samples for Long Context Alignment via Homologous Models' Guidance and Contextual Awareness Measurement | |
Hallucination Detox: Sensitive Neuron Dropout (SeND) for Large Language Model Training | |
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant | |
In-context learning and Occam's razor | |
Router-Tuning: A Simple and Effective Approach for Enabling Dynamic-Depth in Transformers | |
Zero-shot Model-based Reinforcement Learning using Large Language Models | |
SMART: Self-learning Meta-strategy Agent for Reasoning Tasks | |
Mesa-Extrapolation: A Weave Position Encoding Method for Enhanced Extrapolation in LLMs | |
Transformers are Efficient Compilers, Provably | |
LongReward: Improving Long-context Large Language Models with AI Feedback | |
Automatically Interpreting Millions of Features in Large Language Models | |
You can remove GPT2's LayerNorm by fine-tuning | |
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning | |
MiniPLM: Knowledge Distillation for Pre-Training Language Models | |
Value Residual Learning For Alleviating Attention Concentration In Transformers | |
LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging | |
Aligning Large Language Models via Self-Steering Optimization | |
Math Neurosurgery: Isolating Language Models' Math Reasoning Abilities Using Only Forward Passes | |
Beyond Retrieval: Generating Narratives in Conversational Recommender Systems | |
Bridging Search and Recommendation in Generative Retrieval: Does One Task Help the Other? | |
STAR: A Simple Training-free Approach for Recommendations using Large Language Models | |
SouLLMate: An Application Enhancing Diverse Mental Health Support with Adaptive LLMs, Prompt Engineering, and RAG Techniques | |
EvoPress: Towards Optimal Dynamic Model Compression via Evolutionary Search | |
Improving Pinterest Search Relevance Using Large Language Models | |
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data | |
Pyramid Vector Quantization for LLMs | |
TP-Eval: Tap Multimodal LLMs' Potential in Evaluation by Customizing Prompts | |
LongRAG: A Dual-Perspective Retrieval-Augmented Generation Paradigm for Long-Context Question Answering | |
Stick-breaking Attention | |
SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains | |
Scaling Diffusion Language Models via Adaptation from Autoregressive Models | |
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation | |
Frontiers in Intelligent Colonoscopy | |
Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy | |
LLM-based Optimization of Compound AI Systems: A Survey | |
Improving Parallel Program Performance Through DSL-Driven Code Generation with LLM Optimizers | |
M-RewardBench: Evaluating Reward Models in Multilingual Settings | |
MedINST: Meta Dataset of Biomedical Instructions | |
ALTA: Compiler-Based Analysis of Transformers | |
SmartRAG: Jointly Learn RAG-Related Tasks From the Environment Feedback | |
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding | |
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms | |
Should We Really Edit Language Models? On the Evaluation of Edited Language Models | |
Why Does the Effective Context Length of LLMs Fall Short? | |
RRADistill: Distilling LLMs' Passage Ranking Ability for Document Re-Ranking of Long-Tail Queries in a Search Engine | |
Unleashing Reasoning Capability of LLMs via Scalable Question Synthesis from Scratch | |
LOGO -- Long cOntext aliGnment via efficient preference Optimization | |
CCI3.0-HQ: a large-scale Chinese dataset of high quality designed for pre-training large language models | |
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs | |
Multi-Draft Speculative Sampling: Canonical Architectures and Theoretical Limits | |
ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | |
Language Models are Symbolic Learners in Arithmetic | |
Balancing Label Quantity and Quality for Scalable Elicitation | |
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm | |
AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline | |
SpinQuant: LLM quantization with learned rotations | |
WAFFLE: Multi-Modal Model for Automated Front-End Development | |
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning | |
FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs | |
Taipan: Efficient and Expressive State Space Language Models with Selective Attention | |
Can Knowledge Editing Really Correct Hallucinations? | |
When "A Helpful Assistant" Is Not Really Helpful: Personas in System Prompts Do Not Improve Performances of Large Language Models | |
Distill Visual Chart Reasoning Ability from LLMs to MLLMs | |
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs | |
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning | |
Provably Robust Watermarks for Open-Source Language Models | |
DeCoRe: Decoding by Contrasting Retrieval Heads to Mitigate Hallucinations | |
Rethinking Softmax: Self-Attention with Polynomial Activations | |
SIKeD: Self-guided Iterative Knowledge Distillation for mathematical reasoning | |
Understanding Players as if They Are Talking to the Game in a Customized Language: A Pilot Study | |
The Nature of Mathematical Modeling and Probabilistic Optimization Engineering in Generative AI | |
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models | |
ZIP-FIT: Embedding-Free Data Selection via Compression-Based Alignment | |
Future Token Prediction -- Causal Language Modelling with Per-Token Semantic State Vector for Multi-Token Prediction | |
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements | |
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering | |
DreamLIP: Language-Image Pre-training with Long Captions | |
Inductive Biases and Variable Creation in Self-Attention Mechanisms | |
An LLM Agent for Automatic Geospatial Data Analysis | |
EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning | |
Duo-LLM: A Framework for Studying Adaptive Computation in Large Language Models | |
Long Term Memory: The Foundation of AI Self-Evolution | |
From Single to Multi: How LLMs Hallucinate in Multi-Document Summarization | |
SimpleToM: Exposing the Gap between Explicit ToM Inference and Implicit ToM Application in LLMs | |
LeanAgent: Lifelong Learning for Formal Theorem Proving | |
Little Giants: Synthesizing High-Quality Embedding Data at Scale | |
Beyond position: how rotary embeddings shape representations and memory in autoregressive transfomers | |
A Survey of Conversational Search | |
Explaining Graph Neural Networks with Large Language Models: A Counterfactual Perspective for Molecular Property Prediction | |
How LLMs Aid in UML Modeling: An Exploratory Study with Novice Analysts | |
Teach Multimodal LLMs to Comprehend Electrocardiographic Images | |
Knowledge Graph Enhanced Language Agents for Recommendation | |
AgentSense: Benchmarking Social Intelligence of Language Agents through Interactive Scenarios | |
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training | |
Fictitious Synthetic Data Can Improve LLM Factuality via Prerequisite Learning | |
VisionCoder: Empowering Multi-Agent Auto-Programming for Image Processing with Hybrid LLMs | |
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark | |
Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback | |
Counting Ability of Large Language Models and Impact of Tokenization | |
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design | |
Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance | |
Reflection-Bench: probing AI intelligence with reflection | |
PAPILLON: PrivAcy Preservation from Internet-based and Local Language MOdel ENsembles | |
Analysing the Residual Stream of Language Models Under Knowledge Conflicts | |
CoqPilot, a plugin for LLM-based generation of proofs | |
Measuring memorization through probabilistic discoverable extraction | |
Computational Bottlenecks of Training Small-scale Large Language Models | |
Mixture of Parrots: Experts improve memorization more than reasoning | |
M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation | |
Beyond Autoregression: Fast LLMs via Self-Distillation Through Time | |
A Survey of Small Language Models | |
HoPE: A Novel Positional Encoding Without Long-Term Decay for Enhanced Context Awareness and Extrapolation | |
Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction | |
LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation | |
KD-LoRA: A Hybrid Approach to Efficient Fine-Tuning with LoRA and Knowledge Distillation | |
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models | |
Plan$\times$RAG: Planning-guided Retrieval Augmented Generation | |
Faster WIND: Accelerating Iterative Best-of-$N$ Distillation for LLM Alignment | |
Language Models And A Second Opinion Use Case: The Pocket Professional | |
Fast Best-of-N Decoding via Speculative Rejection | |
UniHGKR: Unified Instruction-aware Heterogeneous Knowledge Retrievers | |
RARe: Retrieval Augmented Retrieval with In-Context Examples | |
Towards Next-Generation LLM-based Recommender Systems: A Survey and Beyond | |
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation | |
Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction | |
Large Language Models Reflect the Ideology of their Creators | |
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration | |
A Survey on Data Synthesis and Augmentation for Large Language Models | |
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference | |
SocialGPT: Prompting LLMs for Social Relation Reasoning via Greedy Segment Optimization | |
EoRA: Training-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation | |
Rephrasing natural text data with different languages and quality levels for Large Language Model pre-training | |
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA | |
Understanding Synthetic Context Extension via Retrieval Heads | |
Matryoshka: Learning to Drive Black-Box LLMs with LLMs | |
The Geometry of Concepts: Sparse Autoencoder Feature Structure | |
Attacking Vision-Language Computer Agents via Pop-ups | |
OpenWebVoyager: Building Multimodal Web Agents via Iterative Real-World Exploration, Feedback and Optimization | |
Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse | |
CLEAR: Character Unlearning in Textual and Visual Modalities | |
Aligning Audio-Visual Joint Representations with an Agentic Workflow | |
Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval | |
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters | |
CORAL: Benchmarking Multi-turn Conversational Retrieval-Augmentation Generation | |
Distinguishing Ignorance from Error in LLM Hallucinations | |
Learning and Unlearning of Fabricated Knowledge in Language Models | |
Decoding Dark Matter: Specialized Sparse Autoencoders for Interpreting Rare Concepts in Foundation Models | |
On the Role of Depth and Looping for In-Context Learning with Task Diversity | |
Can Language Models Replace Programmers? REPOCOD Says 'Not Yet' | |
Natural Language Processing for the Legal Domain: A Survey of Tasks, Datasets, Models, and Challenges | |
Zero-Shot Dense Retrieval with Embeddings from Relevance Feedback | |
Accelerating Direct Preference Optimization with Prefix Sharing | |
AutoMIR: Effective Zero-Shot Medical Information Retrieval without Relevance Labels | |
QTIP: Quantization with Trellises and Incoherence Processing | |
EMMA: End-to-End Multimodal Model for Autonomous Driving | |
SciPIP: An LLM-based Scientific Paper Idea Proposer | |
Zipfian Whitening | |
On Memorization of Large Language Models in Logical Reasoning | |
Stealing User Prompts from Mixture of Experts | |
Toxicity of the Commons: Curating Open-Source Pre-Training Data | |
RuleRAG: Rule-guided retrieval-augmented generation with language models for question answering | |
UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function | |
SelfCodeAlign: Self-Alignment for Code Generation | |
Constraint Back-translation Improves Complex Instruction Following of Large Language Models | |
Nearest Neighbor Normalization Improves Multimodal Retrieval | |
Language Models can Self-Lengthen to Generate Long Texts | |
Beyond Content Relevance: Evaluating Instruction Following in Retrieval Models | |
Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts | |
Weight decay induces low-rank attention layers | |
What Happened in LLMs Layers when Trained for Fast vs. Slow Thinking: A Gradient Perspective | |
Can Models Help Us Create Better Models? Evaluating LLMs as Data Scientists | |
Toward Understanding In-context vs. In-weight Learning | |
Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks | |
BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments | |
GlotCC: An Open Broad-Coverage CommonCrawl Corpus and Pipeline for Minority Languages | |
AAAR-1.0: Assessing AI's Potential to Assist Research | |
Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective | |
Failure Modes of LLMs for Causal Reasoning on Narratives | |
Are Decoder-Only Large Language Models the Silver Bullet for Code Search? | |
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks | |
Search Engines in an AI Era: The False Promise of Factual and Verifiable Source-Cited Responses | |
$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources | |
Physics in Next-token Prediction | |
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization | |
WebRL: Training LLM Web Agents via Self-Evolving Online Curriculum Reinforcement Learning | |
Sparsing Law: Towards Large Language Models with Greater Activation Sparsity | |
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance | |
GPT or BERT: why not both? | |
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent | |
SALSA: Soup-based Alignment Learning for Stronger Adaptation in RLHF | |
Thinking Forward and Backward: Effective Backward Planning with Large Language Models | |
Context Parallelism for Scalable Million-Token Inference | |
RAGViz: Diagnose and Visualize Retrieval-Augmented Generation | |
DynaSaur: Large Language Agents Beyond Predefined Actions | |
Swan and ArabicMTEB: Dialect-Aware, Arabic-Centric, Cross-Lingual, and Cross-Cultural Embedding Models and Benchmarks | |
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding | |
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models | |
Survey of Cultural Awareness in Language Models: Text and Beyond | |
LLM-KT: A Versatile Framework for Knowledge Transfer from Large Language Models to Collaborative Filtering | |
Multi-expert Prompting Improves Reliability, Safety, and Usefulness of Large Language Models | |
E2E-AFG: An End-to-End Model with Adaptive Filtering for Retrieval-Augmented Generation | |
Adapting While Learning: Grounding LLMs for Scientific Problems with Intelligent Tool Usage Adaptation | |
GRS-QA -- Graph Reasoning-Structured Question Answering Dataset | |
BitNet a4.8: 4-bit Activations for 1-bit LLMs | |
Beyond Utility: Evaluating LLM as Recommender | |
Rationale-Guided Retrieval Augmented Generation for Medical Question Answering | |
Personalization of Large Language Models: A Survey | |
AndroidLab: Training and Systematic Benchmarking of Android Autonomous Agents | |
How Does Critical Batch Size Scale in Pre-training? | |
Scaling Optimal LR Across Token Horizons | |
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding | |
Not All Memories are Created Equal: Learning to Forget by Expiring | |
Inference Optimal VLMs Need Only One Visual Token but Larger Models | |
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems | |
Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent | |
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge | |
MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs | |
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution | |
Sample-Efficient Alignment for LLMs | |
LLaMo: Large Language Model-based Molecular Graph Assistant | |
Controlling Language and Diffusion Models by Transporting Activations | |
A Scalable Communication Protocol for Networks of Large Language Models | |
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models | |
Lightning IR: Straightforward Fine-tuning and Inference of Transformer-based Language Models for Information Retrieval | |
Wave Network: An Ultra-Small Language Model | |
Model Equality Testing: Which Model Is This API Serving? | |
A linguistic analysis of undesirable outcomes in the era of generative AI | |
Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level | |
Long Context RAG Performance of Large Language Models | |
LASER: Attention with Exponential Transformation | |
Photon: Federated LLM Pre-Training | |
How Transformers Solve Propositional Logic Problems: A Mechanistic Analysis | |
M3SciQA: A Multi-Modal Multi-Document Scientific QA Benchmark for Evaluating Foundation Models | |
Evaluation data contamination in LLMs: how do we measure it and (when) does it matter? | |
MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba | |
MRJ-Agent: An Effective Jailbreak Agent for Multi-Round Dialogue | |
SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models | |
Can LLMs make trade-offs involving stipulated pain and pleasure states? | |
Improbable Bigrams Expose Vulnerabilities of Incomplete Tokens in Byte-Level Tokenizers | |
Formal Theorem Proving by Rewarding LLMs to Decompose Proofs Hierarchically | |
Teaching Models to Improve on Tape | |
Evolving Alignment via Asymmetric Self-Play | |
Scaling LLM Inference with Optimized Sample Compute Allocation | |
Self-Consistency Preference Optimization | |
Tiny Transformers Excel at Sentence Compression | |
Polynomial Composition Activations: Unleashing the Dynamics of Large Language Models | |
Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination | |
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond | |
A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness | |
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks | |
LoRA vs Full Fine-tuning: An Illusion of Equivalence | |
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models | |
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models | |
RetrieveGPT: Merging Prompts and Mathematical Models for Enhanced Code-Mixed Information Retrieval | |
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model | |
LSHBloom: Memory-efficient, Extreme-scale Document Deduplication | |
Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks? | |
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos | |
Towards Reliable Alignment: Uncertainty-aware RLHF | |
Abrupt Learning in Transformers: A Case Study on Matrix Completion | |
MultiTok: Variable-Length Tokenization for Efficient LLMs Adapted from LZW Compression | |
O1 Replication Journey: A Strategic Progress Report -- Part 1 | |
KVSharer: Efficient Inference via Layer-Wise Dissimilar KV Cache Sharing | |
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition | |
Methods of improving LLM training stability | |
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1.58 Inference on CPUs | |
CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts | |
Generalized Probabilistic Attention Mechanism in Transformers | |
Economic Anthropology in the Era of Generative Artificial Intelligence | |
Latent Space Chain-of-Embedding Enables Output-free LLM Self-Evaluation | |
MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts | |
A Systematic Study of Cross-Layer KV Sharing for Efficient LLM Inference | |
MoDification: Mixture of Depths Made Easy | |
Speciesism in Natural Language Processing Research | |
Reducing the Transformer Architecture to a Minimum | |
MoR: Mixture of Ranks for Low-Rank Adaptation Tuning | |
Metacognitive Monitoring: A Human Ability Beyond Generative Artificial Intelligence | |
Hypothesis Testing the Circuit Hypothesis in LLMs | |
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression | |
Theoretical Analysis of Hierarchical Language Recognition and Generation by Transformers without Positional Encoding | |
Conformity in Large Language Models | |
Mitigating Frequency Bias and Anisotropy in Language Model Pre-Training with Syntactic Smoothing | |
A Case for AI Consciousness: Language Agents and Global Workspace Theory | |
Local and Global Decoding in Text Generation | |
SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators | |
Geometric Signatures of Compositionality Across a Language Model's Lifetime | |
Is Parameter Collision Hindering Continual Learning in LLMs? | |
Reverse Modeling in Large Language Models | |
On the Proper Treatment of Tokenization in Psycholinguistics | |
Post-edits Are Preferences Too | |
Theoretical Insights into Fine-Tuning Attention Mechanism: Generalization and Optimization | |
Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity | |
EmbedLLM: Learning Compact Representations of Large Language Models | |
Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1 | |
Mitigating Memorization In Language Models | |
House of Cards: Massive Weights in LLMs | |
U-shaped and Inverted-U Scaling behind Emergent Abilities of Large Language Models | |
Sparse Autoencoders Reveal Temporal Difference Learning in Large Language Models | |
Investigating the Synergistic Effects of Dropout and Residual Connections on Language Model Training | |
RisingBALLER: A player is a token, a match is a sentence, A path towards a foundational model for football players data analytics | |
MoS: Unleashing Parameter Efficiency of Low-Rank Adaptation with Mixture of Shards | |
Self-Updatable Large Language Models with Parameter Integration | |
Are LLMs Aware that Some Questions are not Open-ended? | |
Vision Language Models See What You Want but not What You See | |
A Looming Replication Crisis in Evaluating Behavior in Language Models? Evidence and Solutions | |
1 Trillion Token (1TT) Platform: A Novel Framework for Efficient Data Sharing and Compensation in Large Language Models | |
Can LLMs Really Learn to Translate a Low-Resource Language from One Grammar Book? | |
M3DocRAG: Multi-modal Retrieval is What You Need for Multi-page Multi-document Understanding | |
Analyzing The Language of Visual Tokens | |
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities | |
Model merging with SVD to tie the Knots | |
Best Practices for Distilling Large Language Models into BERT for Web Search Ranking | |
Interpretable Language Modeling via Induction-head Ngram Models | |
Unlearning in- vs. out-of-distribution data in LLMs under gradient-based method | |
GUI Agents with Foundation Models: A Comprehensive Survey | |
Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks | |
DELIFT: Data Efficient Language model Instruction Fine Tuning | |
Aioli: A Unified Optimization Framework for Language Model Data Mixing | |
LBPE: Long-token-first Tokenization to Improve Large Language Models | |
Balancing Pipeline Parallelism with Vocabulary Parallelism | |
Fox-1 Technical Report | |
STAND-Guard: A Small Task-Adaptive Content Moderation Model | |
Alopex: A Computational Framework for Enabling On-Device Function Calls with LLMs | |
CodeLutra: Boosting LLM Code Generation via Preference-Guided Refinement | |
Q-SFT: Q-Learning for Language Models via Supervised Fine-Tuning | |
FineTuneBench: How well do commercial fine-tuning APIs infuse knowledge into LLMs? | |
Reflections from the 2024 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry | |
Performance-Guided LLM Knowledge Distillation for Efficient Text Classification at Scale | |
Towards Interpreting Language Models: A Case Study in Multi-Hop Reasoning | |
LLMs as Research Tools: A Large Scale Survey of Researchers' Usage and Perceptions | |
Scattered Forest Search: Smarter Code Space Exploration with LLMs | |
Parameter-Efficient Fine-Tuning of Large Language Models for Unit Test Generation: An Empirical Study | |
RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models | |
An Early FIRST Reproduction and Improvements to Single-Token Decoding for Fast Listwise Reranking | |
ZipNN: Lossless Compression for AI Models | |
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation | |
Efficient Constant-Space Multi-Vector Retrieval | |
FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI | |
Number Cookbook: Number Understanding of Language Models and How to Improve It | |
Mixtures of In-Context Learners | |
Beyond Text: Optimizing RAG with Multimodal Inputs for Industrial Applications | |
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics | |
Investigating the Role of Prompting and External Tools in Hallucination Rates of Large Language Models | |
AFlow: Automating Agentic Workflow Generation | |
Recycled Attention: Efficient inference for long-context language models | |
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding | |
Counterfactual Generation from Language Models | |
Chinese SimpleQA: A Chinese Factuality Evaluation for Large Language Models | |
Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models | |
IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization | |
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM | |
Does your LLM truly unlearn? An embarrassingly simple approach to recover unlearned knowledge | |
Game-theoretic LLM: Agent Workflow for Negotiation Games | |
Ablation is Not Enough to Emulate DPO: How Neuron Dynamics Drive Toxicity Reduction | |
NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts | |
GitChameleon: Unmasking the Version-Switching Capabilities of Code Generation Models | |
More Expressive Attention with Negative Weights | |
Retrieval or Global Context Understanding? On Many-Shot In-Context Learning for Long-Context Evaluation | |
LLM-Neo: Parameter Efficient Knowledge Distillation for Large Language Models | |
End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering | |
Learning Code Preference via Synthetic Evolution | |
Energy Efficient Protein Language Models: Leveraging Small Language Models with LoRA for Controllable Protein Generation | |
Scaling Laws for Precision | |
Trustful LLMs: Customizing and Grounding Text Generation with Knowledge Bases and Dual Decoders | |
RedCode: Risky Code Execution and Generation Benchmark for Code Agents | |
Likelihood as a Performance Gauge for Retrieval-Augmented Generation | |
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows | |
Entropy Controllable Direct Preference Optimization | |
SecEncoder: Logs are All You Need in Security | |
Rapid Response: Mitigating LLM Jailbreaks with a Few Examples | |
Toward Optimal Search and Retrieval for RAG | |
The Super Weight in Large Language Models | |
Multi-Modal Forecaster: Jointly Predicting Time Series and Textual Data | |
What Should Baby Models Read? Exploring Sample-Efficient Data Composition on Model Performance | |
Sufficient Context: A New Lens on Retrieval Augmented Generation Systems | |
Towards Low-bit Communication for Tensor Parallel LLM Inference | |
What Do Learning Dynamics Reveal About Generalization in LLM Reasoning? | |
The Crucial Role of Samplers in Online Direct Preference Optimization | |
SetLexSem Challenge: Using Set Operations to Evaluate the Lexical and Semantic Robustness of Language Models | |
Stronger Models are NOT Stronger Teachers for Instruction Tuning | |
Hardware and Software Platform Inference | |
Direct Preference Optimization Using Sparse Feature-Level Constraints | |
An Empirical Study on LLM-based Agents for Automated Bug Fixing | |
CamemBERT 2.0: A Smarter French Language Model Aged to Perfection | |
Can sparse autoencoders be used to decompose and interpret steering vectors? | |
Dynamic Subset Tuning: Expanding the Operational Range of Parameter-Efficient Training for Large Language Models | |
Large Language Models Can Self-Improve in Long-context Reasoning | |
Language Models as Causal Effect Generators | |
Natural Language Reinforcement Learning | |
Model Stealing for Any Low-Rank Language Model | |
Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs | |
XiYan-SQL: A Multi-Generator Ensemble Framework for Text-to-SQL | |
Controllable Context Sensitivity and the Knob Behind It | |
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models | |
Pie: Pooling CPU Memory for LLM Inference | |
Cut Your Losses in Large-Vocabulary Language Models | |
LLMStinger: Jailbreaking LLMs using RL fine-tuned LLMs | |
Dynamic Rewarding with Prompt Optimization Enables Tuning-free Self-Alignment of Language Models | |
A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look | |
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? | |
Squeezed Attention: Accelerating Long Context Length LLM Inference | |
Hermes: A Large Language Model Framework on the Journey to Autonomous Networks | |
Number it: Temporal Grounding Videos like Flipping Manga | |
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning | |
Benchmarking Distributional Alignment of Large Language Models | |
Towards Edge General Intelligence via Large Language Models: Opportunities and Challenges | |
LLaVA-o1: Let Vision Language Models Reason Step-by-Step | |
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use | |
Xmodel-1.5: An 1B-scale Multilingual LLM | |
MARS: Unleashing the Power of Variance Reduction for Training Large Models | |
Generative Agent Simulations of 1,000 People | |
Hidden Persuaders: LLMs' Political Leaning and Their Influence on Voters | |
Drowning in Documents: Consequences of Scaling Reranker Inference | |
Top-$nσ$: Not All Logits Are You Need | |
Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts | |
Fine-tuning large language models for domain adaptation: Exploration of training strategies, scaling, model merging and synergistic capabilities | |
Closing the Curious Case of Neural Text Degeneration | |
Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific Subspaces of Pre-trained Language Models | |
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering | |
LLäMmlein: Compact and Competitive German-Only Language Models from Scratch | |
Analyzing Pokémon and Mario Streamers' Twitch Chat with LLM-based User Embeddings | |
BlueLM-V-3B: Algorithm and System Co-Design for Multimodal Large Language Models on Mobile Devices | |
Evaluating the role of `Constitutions' for learning from AI feedback | |
SlimLM: An Efficient Small Language Model for On-Device Document Assistance | |
Adaptive Decoding via Latent Preference Optimization | |
Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering | |
MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning | |
Bi-Mamba: Towards Accurate 1-Bit State Space Models | |
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration | |
FedCoLLM: A Parameter-Efficient Federated Co-tuning Framework for Large and Small Language Models | |
Steering Language Model Refusal with Sparse Autoencoders | |
VersaTune: Fine-Tuning Multi-Ability LLMs Efficiently | |
MoE-Lightning: High-Throughput MoE Inference on Memory-constrained GPUs | |
Beyond Human-Like Processing: Large Language Models Perform Equivalently on Forward and Backward Scientific Text | |
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration | |
Does Prompt Formatting Have Any Impact on LLM Performance? | |
KuaiFormer: Transformer-Based Retrieval at Kuaishou | |
Empowering Meta-Analysis: Leveraging Large Language Models for Scientific Synthesis | |
Evaluating Tokenizer Performance of Large Language Models Across Official Indian Languages | |
Technical Report: Enhancing LLM Reasoning with Reward-guided Tree Search | |
UnifiedCrawl: Aggregated Common Crawl for Affordable Adaptation of LLMs on Low-Resource Languages | |
RedPajama: an Open Dataset for Training Large Language Models | |
Building Trust: Foundations of Security, Safety and Transparency in AI | |
A Taxonomy of AgentOps for Enabling Observability of Foundation Model based Agents | |
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context Training | |
On the Way to LLM Personalization: Learning to Remember User Conversations | |
VideoAutoArena: An Automated Arena for Evaluating Large Multimodal Models in Video Analysis through User Simulation | |
MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers | |
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues | |
SymDPO: Boosting In-Context Learning of Large Multimodal Models with Symbol Demonstration Direct Preference Optimization | |
Refusal in LLMs is an Affine Function | |
Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents | |
FreeAL: Towards Human-Free Active Learning in the Era of Large Language Models | |
ORID: Organ-Regional Information Driven Framework for Radiology Report Generation | |
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models | |
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions | |
OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs | |
Hymba: A Hybrid-head Architecture for Small Language Models | |
Are Large Language Models Memorizing Bug Benchmarks? | |
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training | |
Ultra-Sparse Memory Network | |
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization | |
Sparse Upcycling: Inference Inefficient Finetuning | |
ChatGPT in Research and Education: Exploring Benefits and Threats | |
Merging in a Bottle: Differentiable Adaptive Merging (DAM) and the Path from Averaging to Automation | |
Scaling Laws for Reward Model Overoptimization | |
Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models | |
Patience Is The Key to Large Language Model Reasoning | |
Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations | |
Auto-Regressive Next-Token Predictors are Universal Learners | |
Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models | |
A Reproducibility and Generalizability Study of Large Language Models for Query Generation | |
Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models | |
Disentangling Memory and Reasoning Ability in Large Language Models | |
RE-Bench: Evaluating frontier AI R&D capabilities of language model agents against human experts | |
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | |
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models | |
Evaluating the Robustness of Analogical Reasoning in Large Language Models | |
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training | |
One to rule them all: natural language to bind communication, perception and action | |
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models | |
Understanding LLM Embeddings for Regression | |
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games | |
A Flexible Large Language Models Guardrail Development Methodology Applied to Off-Topic Prompt Detection | |
Conversational Medical AI: Ready for Practice | |
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge | |
MH-MoE:Multi-Head Mixture-of-Experts | |
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson? | |
From CISC to RISC: language-model guided assembly transpilation | |
Predicting Emergent Capabilities by Finetuning | |
LLMs Do Not Think Step-by-step In Implicit Reasoning | |
Knowledge Transfer Across Modalities with Natural Language Supervision | |
The Impossible Test: A 2024 Unsolvable Dataset and A Chance for an AGI Quiz | |
Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens | |
Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration | |
SketchAgent: Language-Driven Sequential Sketch Generation | |
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token | |
Pushing the Limits of Large Language Model Quantization via the Linearity Theorem | |
VLRewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models | |
BPP-Search: Enhancing Tree of Thought Reasoning for Mathematical Modeling Problem Solving | |
2D Matryoshka Training for Information Retrieval | |
Star Attention: Efficient LLM Inference over Long Sequences | |
Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering | |
Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts? | |
Self-Generated Critiques Boost Reward Modeling for Language Models | |
All Languages Matter: Evaluating LMMs on Culturally Diverse 100 Languages | |
Multi-modal Retrieval Augmented Multi-modal Generation: A Benchmark, Evaluate Metrics and Strong Baselines | |
Low-Rank Correction for Quantized LLMs | |
SALOVA: Segment-Augmented Long Video Assistant for Targeted Retrieval and Routing in Long-Form Video Analysis | |
A Survey on LLM-as-a-Judge | |
From MTEB to MTOB: Retrieval-Augmented Classification for Descriptive Grammars | |
From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set | |
The Extractive-Abstractive Spectrum: Uncovering Verifiability Trade-offs in LLM Generations | |
MME-Survey: A Comprehensive Survey on Evaluation of Multimodal LLMs | |
MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts | |
LlaMaVAE: Guiding Large Language Model Generation via Continuous Latent Sentence Spaces | |
Inference Scaling $\scriptsize\mathtt{F}$Laws: The Limits of LLM Resampling with Imperfect Verifiers | |
Boundless Socratic Learning with Language Games | |
Draft Model Knows When to Stop: A Self-Verification Length Policy for Speculative Decoding | |
ChatRex: Taming Multimodal LLM for Joint Perception and Understanding | |
Large Language Model-Brained GUI Agents: A Survey | |
Training and Evaluating Language Models with Template-based Data Generation | |
Mind the Gap: Examining the Self-Improvement Capabilities of Large Language Models | |
MALMM: Multi-Agent Large Language Models for Zero-Shot Robotics Manipulation | |
HyperSeg: Towards Universal Visual Segmentation with Large Language Model | |
LoLCATs: On Low-Rank Linearizing of Large Language Models | |
AI as Humanity's Salieri: Quantifying Linguistic Creativity of Language Models via Systematic Attribution of Machine Text against Web Text | |
o1-Coder: an o1 Replication for Coding | |
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format | |
Llama Guard 3-1B-INT4: Compact and Efficient Safeguard for Human-AI Conversations | |
HARec: Hyperbolic Graph-LLM Alignment for Exploration and Exploitation in Recommender Systems | |
LEADRE: Multi-Faceted Knowledge Enhanced LLM Empowered Display Advertisement Recommender System | |
FastRAG: Retrieval Augmented Generation for Semi-structured Data | |
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning | |
AfriMed-QA: A Pan-African, Multi-Specialty, Medical Question-Answering Benchmark Dataset | |
AIGS: Generating Science from AI-Powered Automated Falsification | |
FinRobot: AI Agent for Equity Research and Valuation with Large Language Models | |
Two are better than one: Context window extension with multi-grained self-injection | |
Towards Knowledge Checking in Retrieval-augmented Generation: A Representation Perspective | |
Ranking Unraveled: Recipes for LLM Rankings in Head-to-Head AI Combat | |
Unveiling the Hidden Structure of Self-Attention via Kernel Principal Component Analysis | |
On Domain-Specific Post-Training for Multimodal Large Language Models | |
Reverse Thinking Makes LLMs Stronger Reasoners | |
Advanced System Integration: Analyzing OpenAPI Chunking for Retrieval-Augmented Generation | |
Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems | |
LLM Teacher-Student Framework for Text Classification With No Manually Annotated Data: A Case Study in IPTC News Topic Classification | |
Does Representation Matter? Exploring Intermediate Layers in Large Language Models | |
TQA-Bench: Evaluating LLMs for Multi-Table Question Answering with Scalable Context and Symbolic Extension | |
Zero-Indexing Internet Search Augmented Generation for Large Language Models | |
Towards Understanding Retrieval Accuracy and Prompt Quality in RAG Systems | |
Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models | |
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs | |
Mars-PO: Multi-Agent Reasoning System Preference Optimization | |
ICLERB: In-Context Learning Embedding and Reranker Benchmark | |
MATATA: a weak-supervised MAthematical Tool-Assisted reasoning for Tabular Applications | |
Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS | |
Leveraging Retrieval-Augmented Generation for University Knowledge Retrieval | |
Yi-Lightning Technical Report | |
Sneaking Syntax into Transformer Language Models with Tree Regularization | |
T-REG: Preference Optimization with Token-Level Reward Regularization | |
Gradient Localization Improves Lifelong Pretraining of Language Models | |
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information? | |
OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation | |
Free Process Rewards without Process Labels | |
X-Prompt: Towards Universal In-Context Image Generation in Auto-Regressive Vision Language Foundation Models | |
The Evolution of RWKV: Advancements in Efficient Language Modeling | |
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models | |
[CLS] Attention is All You Need for Training-Free Visual Token Pruning: Make VLM Inference Faster | |
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant | |
MBA-RAG: a Bandit Approach for Adaptive Retrieval-Augmented Generation through Question Complexity | |
COAP: Memory-Efficient Training with Correlation-Aware Gradient Projection | |
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences | |
CPRM: A LLM-based Continual Pre-training Framework for Relevance Modeling in Commercial Search | |
Ponder & Press: Advancing Visual GUI Agent towards General Computer Control | |
Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input | |
Language Models Encode Numbers Using Digit Representations in Base 10 | |
VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception of Geometric Information | |
VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by Video Spatiotemporal Augmentation | |
Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting | |
DynRank: Improving Passage Retrieval with Dynamic Zero-Shot Prompting Based on Question Classification | |
Enhancing Zero-shot Chain of Thought Prompting via Uncertainty-Guided Strategy Selection | |
Baba Is AI: Break the Rules to Beat the Benchmark | |
Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability | |
VLSBench: Unveiling Visual Leakage in Multimodal Safety | |
INCLUDE: Evaluating Multilingual Language Understanding with Regional Knowledge | |
A Simple and Provable Scaling Law for the Test-Time Compute of Large Language Models | |
GATE OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation | |
STAR: Synthesis of Tailored Architectures | |
Best-of-N Jailbreaking | |
PaliGemma 2: A Family of Versatile VLMs for Transfer | |
Beyond Questions: Leveraging ColBERT for Keyphrase Search | |
RedStone: Curating General, Code, Math, and QA Data for Large Language Models | |
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for accelerating Large VLMs | |
FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness | |
AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning | |
A dynamic parallel method for performance optimization on hybrid CPUs | |
Weighted-Reward Preference Optimization for Implicit Model Fusion | |
Surveying the Effects of Quality, Diversity, and Complexity in Synthetic Data From Large Language Models | |
Does Few-Shot Learning Help LLM Performance in Code Synthesis? | |
RARE: Retrieval-Augmented Reasoning Enhancement for Large Language Models | |
Time-Reversal Provides Unsupervised Feedback to LLMs | |
Self-Improvement in Language Models: The Sharpening Mechanism | |
Down with the Hierarchy: The 'H' in HNSW Stands for "Hubs" | |
MALT: Improving Reasoning with Multi-Agent LLM Training | |
Generating a Low-code Complete Workflow via Task Decomposition and RAG | |
Truth or Mirage? Towards End-to-End Factuality Evaluation with LLM-Oasis | |
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning | |
U-MATH: A University-Level Benchmark for Evaluating Mathematical Skills in LLMs | |
QA-TOOLBOX: Conversational Question-Answering for process task guidance in manufacturing | |
Explainable CTR Prediction via LLM Reasoning | |
Video-3D LLM: Learning Position-Aware Video Representation for 3D Scene Understanding | |
VARCO-VISION: Expanding Frontiers in Korean Vision-Language Models | |
VisionZip: Longer is Better but Not Necessary in Vision Language Models | |
Florence-VL: Enhancing Vision-Language Models with Generative Vision Encoder and Depth-Breadth Fusion | |
Retrieval-Augmented Machine Translation with Unstructured Knowledge | |
Densing Law of LLMs | |
Monet: Mixture of Monosemantic Experts for Transformers | |
Marco-LLM: Bridging Languages via Massive Multilingual Training for Cross-Lingual Enhancement | |
A Survey on Large Language Model-Based Social Agents in Game-Theoretic Scenarios | |
Domain-specific Question Answering with Hybrid Search | |
Evaluating Language Models as Synthetic Data Generators | |
Theoretical limitations of multi-layer Transformer | |
VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding | |
Personalized Multimodal Large Language Models: A Survey | |
Optimal Memorization Capacity of Transformers | |
NVILA: Efficient Frontier Visual Language Models | |
p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay | |
Discriminative Fine-tuning of LVLMs | |
Challenges in Trustworthy Human Evaluation of Chatbots | |
ALMA: Alignment with Minimal Annotation | |
Global MMLU: Understanding and Addressing Cultural and Linguistic Biases in Multilingual Evaluation | |
KV Shifting Attention Enhances Language Modeling | |
Establishing Task Scaling Laws via Compute-Efficient Model Ladders | |
Evolutionary Pre-Prompt Optimization for Mathematical Reasoning | |
Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges | |
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling | |
APOLLO: SGD-like Memory, AdamW-level Performance | |
CompCap: Improving Multimodal Large Language Models with Composite Captions | |
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale | |
In Tree Structure Should Sentence Be Generated | |
BEExformer: A Fast Inferencing Transformer Architecture via Binarization with Multiple Early Exits | |
ConQRet: Benchmarking Fine-Grained Evaluation of Retrieval Augmented Argumentation with LLM Judges | |
LinVT: Empower Your Image-level Large Language Model to Understand Videos | |
Transformers Can Navigate Mazes With Multi-Step Prediction | |
Gated Delta Networks: Improving Mamba2 with Delta Rule | |
Frontier Models are Capable of In-context Scheming | |
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners | |
DEMO: Reframing Dialogue Interaction with Fine-grained Element Modeling | |
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases | |
Transformers Struggle to Learn to Search | |
The Hyperfitting Phenomenon: Sharpening and Stabilizing LLMs for Open-Ended Text Generation | |
Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models | |
Mixture-of-PageRanks: Replacing Long-Context with Real-Time, Sparse GraphRAG | |
Flex Attention: A Programming Model for Generating Optimized Attention Kernels | |
Training Large Language Models to Reason in a Continuous Latent Space | |
ProcessBench: Identifying Process Errors in Mathematical Reasoning | |
Exploring Multi-Grained Concept Annotations for Multimodal Large Language Models | |
If You Can't Use Them, Recycle Them: Optimizing Merging at Scale Mitigates Performance Tradeoffs | |
Robust Multi-bit Text Watermark with LLM-based Paraphrasers | |
An Evolved Universal Transformer Memory | |
Mixture of Hidden-Dimensions Transformer | |
Granite Guardian | |
Adapting to Non-Stationary Environments: Multi-Armed Bandit Enhanced Retrieval-Augmented Generation on Knowledge Graphs | |
Automatic Database Configuration Debugging using Retrieval-Augmented Language Models | |
Large Language Models are Biased Because They Are Large Language Models | |
Maya: An Instruction Finetuned Multilingual Multimodal Model | |
Fully Open Source Moxin-7B Technical Report | |
ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance | |
Evaluating and Aligning CodeLLMs on Human Preference | |
Pre-trained Language Models Return Distinguishable Probability Distributions to Unfaithfully Hallucinated Texts | |
Perception Tokens Enhance Visual Reasoning in Multimodal Language Models | |
Tool Learning with Foundation Models | |
POINTS1.5: Building a Vision-Language Model towards Real World Applications | |
3DSRBench: A Comprehensive 3D Spatial Reasoning Benchmark | |
Is Personality Prediction Possible Based on Reddit Comments? | |
Contextualized Counterspeech: Strategies for Adaptation, Personalization, and Evaluation | |
Frame Representation Hypothesis: Multi-Token LLM Interpretability and Concept-Guided Text Generation | |
HARP: Hesitation-Aware Reframing in Transformer Inference Pass | |
Chimera: Improving Generalist Model with Domain-Specific Experts | |
StreamChat: Chatting with Streaming Video | |
KaSA: Knowledge-Aware Singular-Value Adaptation of Large Language Models | |
The BrowserGym Ecosystem for Web Agent Research | |
Advancing Single- and Multi-task Text Classification through Large Language Model Fine-tuning | |
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images | |
From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons | |
Forest-of-Thought: Scaling Test-Time Compute for Enhancing LLM Reasoning | |
Phi-4 Technical Report | |
Large Concept Models: Language Modeling in a Sentence Representation Space | |
Test-Time Alignment via Hypothesis Reweighting | |
LatentQA: Teaching LLMs to Decode Activations Into Natural Language | |
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions | |
The Impact of Copyrighted Material on Large Language Models: A Norwegian Perspective | |
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity Visual Descriptions | |
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation | |
JuStRank: Benchmarking LLM Judges for System Ranking | |
RuleArena: A Benchmark for Rule-Guided Reasoning with LLMs in Real-World Scenarios | |
VisionArena: 230K Real World User-VLM Conversations with Preference Labels | |
Semantic Retrieval at Walmart | |
Fine-Tuning Language Models with Advantage-Induced Policy Alignment | |
LIBER: Lifelong User Behavior Modeling Based on Large Language Models | |
FuseGPT: Learnable Layers Fusion of Generative Pre-trained Transformers | |
Understanding World or Predicting Future? A Comprehensive Survey of World Models | |
Improving training time and GPU utilization in geo-distributed language model training | |
A Survey of Financial AI: Architectures, Advances and Open Challenges | |
Gini Coefficient as a Unified Metric for Evaluating Many-versus-Many Similarity in Vector Spaces | |
Large Language Models as Neurolinguistic Subjects: Identifying Internal Representations for Form and Meaning | |
Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models | |
Adapting Language Models via Token Translation | |
Length-Induced Embedding Collapse in Transformer-based Models | |
L3Ms -- Lagrange Large Language Models | |
Efficient Training of Sparse Autoencoders for Large Language Models via Layer Groups | |
BIG5-CHAT: Shaping LLM Personalities Through Training on Human-Grounded Data | |
Subspace Optimization for Large Language Models with Convergence Guarantees | |
Liger Kernel: Efficient Triton Kernels for LLM Training | |
Towards Linguistically-Aware and Language-Independent Tokenization for Large Language Models (LLMs) | |
Defining Knowledge: Bridging Epistemology and Large Language Models | |
LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management | |
'Simulacrum of Stories': Examining Large Language Models as Qualitative Research Participants | |
Self-attention as an attractor network: transient memories without backpropagation | |
AI Can Be Cognitively Biased: An Exploratory Study on Threshold Priming in LLM-Based Batch Relevance Assessment | |
A Large Language Model and Denoising Diffusion Framework for Targeted Design of Microstructures with Commands in Natural Language | |
Guided Profile Generation Improves Personalization with LLMs | |
Is Contrasting All You Need? Contrastive Learning for the Detection and Attribution of AI-generated Text | |
Re-Introducing LayerNorm: Geometric Meaning, Irreversibility and a Comparative Study with RMSNorm | |
Personality Alignment of Large Language Models | |
RazorAttention: Efficient KV Cache Compression Through Retrieval Heads | |
The Hitchhiker's Guide to Human Alignment with *PO | |
Beyond Benchmarking: A New Paradigm for Evaluation and Assessment of Large Language Models | |
Establishing Knowledge Preference in Language Models | |
Do LLMs have Consistent Values? | |
The Foundations of Tokenization: Statistical and Computational Concerns | |
Curriculum Learning for Small Code Language Models | |
Optimized Multi-Token Joint Decoding with Auxiliary Model for LLM Inference | |
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision | |
A Survey of Controllable Learning: Methods and Applications in Information Retrieval | |
LLM Internal States Reveal Hallucination Risk Faced With a Query | |
Efficient Sparse Attention needs Adaptive Token Release | |
LLM Uncertainty Quantification through Directional Entailment Graph and Claim Level Response Augmentation | |
AutoPal: Autonomous Adaptation to Users for Personal AI Companionship | |
It's Morphing Time: Unleashing the Potential of Multiple LLMs via Multi-objective Optimization | |
LLM-Generated Natural Language Meets Scaling Laws: New Explorations and Data Augmentation Methods | |
Mental Modeling of Reinforcement Learning Agents by Language Models | |
SimSMoE: Solving Representational Collapse via Similarity Measure | |
Hybrid Alignment Training for Large Language Models | |
Understanding the RoPE Extensions of Long-Context LLMs: An Attention Perspective | |
LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation | |
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling | |
Adaptive Token Biaser: Knowledge Editing via Biasing Key Entities | |
Abstraction-of-Thought Makes Language Models Better Reasoners | |
Attention Score is not All You Need for Token Importance Indicator in KV Cache Reduction: Value Also Matters | |
Promises, Outlooks and Challenges of Diffusion Language Modeling | |
CodeGemma: Open Code Models Based on Gemma | |
What Kinds of Tokens Benefit from Distant Text? An Analysis on Long Context Language Modeling | |
A Survey on Human Preference Learning for Large Language Models | |
Exploring the Zero-Shot Capabilities of LLMs Handling Multiple Problems at once | |
SoK: Decentralized AI (DeAI) | |
Meaningless is better: hashing bias-inducing words in LLM prompts improves performance in logical reasoning and statistical learning | |
The Two-Hop Curse: LLMs trained on A->B, B->C fail to learn A-->C | |
Adaptive Circuit Behavior and Generalization in Mechanistic Interpretability | |
The Zamba2 Suite: Technical Report | |
Comparative Analysis of Pooling Mechanisms in LLMs: A Sentiment Analysis Perspective | |
Planning-Driven Programming: A Large Language Model Programming Workflow | |
Logic Augmented Generation | |
Orca: Enhancing Role-Playing Abilities of Large Language Models by Integrating Personality Traits | |
On the Limits of Language Generation: Trade-Offs Between Hallucination and Mode Collapse | |
Theoretical Analysis of Byte-Pair Encoding | |
Are LLMs Prescient? A Continuous Evaluation using Daily News as the Oracle | |
LLMPhy: Complex Physical Reasoning Using Large Language Models and World Models | |
Efficient Federated Finetuning of Tiny Transformers with Resource-Constrained Devices | |
Warmstarting for Scaling Language Models | |
TreeCoders: Trees of Transformers | |
Token2Wave | |
Enhancing Transformer Training Efficiency with Dynamic Dropout | |
DroidSpeak: Enhancing Cross-LLM Communication | |
Ask, and it shall be given: Turing completeness of prompting | |
Can Language Models Learn to Skip Steps? | |
Unlocking the Theory Behind Scaling 1-Bit Neural Networks | |
Thought Space Explorer: Navigating and Expanding Thought Space for Large Language Model Reasoning | |
All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling | |
Social Science Meets LLMs: How Reliable Are Large Language Models in Social Simulations? | |
Moral Agency in Silico: Exploring Free Will in Large Language Models | |
Personas with Attitudes: Controlling LLMs for Diverse Data Annotation | |
Towards Infinite-Long Prefix in Transformer | |
Glider: Global and Local Instruction-Driven Expert Router | |
Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection | |
An Implementation of Werewolf Agent That does not Truly Trust LLMs | |
ReAttention: Training-Free Infinite Context with Finite Attention Scope | |
Collective Innovation in Groups of Large Language Models | |
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | |
See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding | |
LEGO: Language Model Building Blocks | |
LMLPA: Language Model Linguistic Personality Assessment | |
Investigating Sensitive Directions in GPT-2: An Improved Baseline and Comparative Analysis of SAEs | |
Unifying Economic and Language Models for Enhanced Sentiment Analysis of the Oil Market | |
Light-Weight Fault Tolerant Attention for Large Language Model Training | |
Understanding Likelihood Over-optimisation in Direct Alignment Algorithms | |
Varying Shades of Wrong: Aligning LLMs with Wrong Answers Only | |
Personality Differences Drive Conversational Dynamics: A High-Dimensional NLP Approach | |
A Unified Approach to Routing and Cascading for LLMs | |
Evaluating Language Model Character Traits | |
LoRTA: Low Rank Tensor Adaptation of Large Language Models | |
Using Prompts to Guide Large Language Models in Imitating a Real Person's Language Style | |
Kiss up, Kick down: Exploring Behavioral Changes in Multi-modal Large Language Models with Assigned Visual Personas | |
Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models | |
Meta-Models: An Architecture for Decoding LLM Behaviors Through Interpreted Embeddings and Natural Language | |
On The Adaptation of Unlimiformer for Decoder-Only Transformers | |
Lines of Thought in Large Language Models | |
PersonalLLM: Tailoring LLMs to Individual Preferences | |
Characterizing stable regions in the residual stream of LLMs | |
Counterfactual Token Generation in Large Language Models | |
Forking Paths in Neural Text Generation | |
LLM Echo Chamber: personalized and automated disinformation | |
HLB: Benchmarking LLMs' Humanlikeness in Language Use | |
Supervised Fine-Tuning Achieve Rapid Task Adaption Via Alternating Attention Head Activation Patterns | |
Backtracking Improves Generation Safety | |
Flat-LoRA: Low-Rank Adaption over a Flat Loss Landscape | |
Investigating Layer Importance in Large Language Models | |
Uncovering Latent Chain of Thought Vectors in Language Models | |
Using Large Language Models to Create AI Personas for Replication and Prediction of Media Effects: An Empirical Test of 133 Published Experimental Research Findings | |
Spin glass model of in-context learning | |
Scaling Embedding Layers in Language Models | |
Advancing Prompt Learning through an External Layer | |
Effects of Scale on Language Model Robustness | |
Relating the Seemingly Unrelated: Principled Understanding of Generalization for Generative Models in Arithmetic Reasoning Tasks | |
Financial Statement Analysis with Large Language Models | |
AI TrackMate: Finally, Someone Who Will Give Your Music More Than Just "Sounds Great!" | |
Graph-Structured Speculative Decoding | |
On the Benefits of Rank in Attention Layers | |
Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability | |
Supporting the Digital Autonomy of Elders Through LLM Assistance | |
Psychometric Alignment: Capturing Human Knowledge Distributions via Language Models | |
Dissecting Multiplication in Transformers: Insights into LLMs | |
Open Artificial Knowledge | |
Combining Constraint Programming Reasoning with Large Language Model Predictions | |
Transformer-based Single-Cell Language Model: A Survey | |
Compressed models are NOT miniature versions of large models | |
Beyond KV Caching: Shared Attention for Efficient LLMs | |
Harnessing the Power of Artificial Intelligence to Vitalize Endangered Indigenous Languages: Technologies and Experiences | |
Struct-X: Enhancing Large Language Models Reasoning with Structured Data | |
The Better Angels of Machine Personality: How Personality Relates to LLM Safety | |
Efficiently Training 7B LLM with 1 Million Sequence Length on 8 GPUs | |
Large Language Models as Misleading Assistants in Conversation | |
Apollo: An Exploration of Video Understanding in Large Multimodal Models | |
SCBench: A KV Cache-Centric Analysis of Long-Context Methods | |
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | |
Large Action Models: From Inception to Implementation | |
On Implications of Scaling Laws on Feature Superposition | |
Hey, That's My Model! Introducing Chain & Hash, An LLM Fingerprinting Technique | |
Weighted Grouped Query Attention in Transformers | |
MetaLLM: A High-performant and Cost-efficient Dynamic Framework for Wrapping LLMs | |
BiasScanner: Automatic Detection and Classification of News Bias to Strengthen Democracy | |
MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts | |
Self-Evolving GPT: A Lifelong Autonomous Experiential Learner | |
Real-Time Anomaly Detection and Reactive Planning with Large Language Models | |
On the Universal Truthfulness Hyperplane Inside LLMs | |
A Review of the Challenges with Massive Web-mined Corpora Used in Large Language Models Pre-Training | |
Bucket Pre-training is All You Need | |
Induction Heads as an Essential Mechanism for Pattern Matching in In-context Learning | |
SoftDedup: an Efficient Data Reweighting Method for Speeding Up Language Model Pre-training | |
Virtual Personas for Language Models via an Anthology of Backstories | |
Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons | |
Optimal Decision Making Through Scenario Simulations Using Large Language Models | |
Using Grammar Masking to Ensure Syntactic Validity in LLM-based Modeling Tasks | |
Lazarus: Resilient and Elastic Training of Mixture-of-Experts Models with Adaptive Expert Placement | |
Stephanie: Step-by-Step Dialogues for Mimicking Human Interactions in Social Conversations | |
Improving Self Consistency in LLMs through Probabilistic Tokenization | |
Over the Edge of Chaos? Excess Complexity as a Roadblock to Artificial General Intelligence | |
The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model | |
Fine-Tuning with Divergent Chains of Thought Boosts Reasoning Through Self-Correction in Language Models | |
Raw Text is All you Need: Knowledge-intensive Multi-turn Instruction Tuning for Large Language Model | |
Large Language Models as Evaluators for Scientific Synthesis | |
Efficient Training of Language Models with Compact and Consistent Next Token Distributions | |
Learning to Reduce: Towards Improving Performance of Large Language Models on Structured Data | |
Large Language Model Enhanced Knowledge Representation Learning: A Survey | |
Generative Monoculture in Large Language Models | |
Black Big Boxes: Do Language Models Hide a Theory of Adjective Order? | |
DynaThink: Fast or Slow? A Dynamic Decision-Making Framework for Large Language Models | |
Dynamic Universal Approximation Theory: The Basic Theory for Transformer-based Large Language Models | |
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs | |
LEMoE: Advanced Mixture of Experts Adaptor for Lifelong Model Editing of Large Language Models | |
Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model | |
Debate-to-Write: A Persona-Driven Multi-Agent Framework for Diverse Argument Generation | |
Mixture of In-Context Experts Enhance LLMs' Long Context Awareness | |
The Rise of Artificial Intelligence in Educational Measurement: Opportunities and Ethical Challenges | |
MammothModa: Multi-Modal Large Language Model | |
Encourage or Inhibit Monosemanticity? Revisit Monosemanticity from a Feature Decorrelation Perspective | |
Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance | |
Understanding and Mitigating Tokenization Bias in Language Models | |
Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs | |
Large Vocabulary Size Improves Large Language Models | |
The Responsible Foundation Model Development Cheatsheet: A Review of Tools & Resources | |
Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging | |
What Do VLMs NOTICE? A Mechanistic Interpretability Pipeline for Gaussian-Noise-free Text-Image Corruption and Evaluation | |
Does Cross-Cultural Alignment Change the Commonsense Morality of Language Models? | |
FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models | |
Unlocking the Future: Exploring Look-Ahead Planning Mechanistic Interpretability in Large Language Models | |
Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning | |
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration | |
Large Language Models have Intrinsic Self-Correction Ability | |
Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation | |
Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics | |
Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions | |
Modeling Human Subjectivity in LLMs Using Explicit and Implicit Human Factors in Personas | |
Ranking LLMs by compression | |
SPL: A Socratic Playground for Learning Powered by Large Language Model | |
Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning | |
Elliptical Attention | |
In-Context Former: Lightning-fast Compressing Context for Large Language Model | |
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models | |
Locating and Extracting Relational Concepts in Large Language Models | |
Amphista: Bi-directional Multi-head Decoding for Accelerating LLM Inference | |
Synergizing Foundation Models and Federated Learning: A Survey | |
Can Large Language Models Always Solve Easy Problems if They Can Solve Harder Ones? | |
What Makes Two Language Models Think Alike? | |
P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts | |
Compressed Chain of Thought: Efficient Reasoning Through Dense Representations | |
Is persona enough for personality? Using ChatGPT to reconstruct an agent's latent personality from simple descriptions | |
LLMs Are Prone to Fallacies in Causal Inference | |
Compact Proofs of Model Performance via Mechanistic Interpretability | |
Towards an End-to-End Framework for Invasive Brain Signal Decoding with Large Language Models | |
MetaGPT: Merging Large Language Models Using Model Exclusive Task Arithmetic | |
A Survey on Large Language Model-based Agents for Statistics and Data Science | |
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts | |
A Peek into Token Bias: Large Language Models Are Not Yet Genuine Reasoners | |
Multilingual Large Language Models and Curse of Multilinguality | |
Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models | |
Personalized Pieces: Efficient Personalized Large Language Models through Collaborative Efforts | |
3D-RPE: Enhancing Long-Context Modeling Through 3D Rotary Position Encoding | |
Federated Learning driven Large Language Models for Swarm Intelligence: A Survey | |
Developing Safe and Responsible Large Language Model : Can We Balance Bias Reduction and Language Understanding in Large Language Models? | |
Cofca: A Step-Wise Counterfactual Multi-hop QA benchmark | |
Future Lens: Anticipating Subsequent Tokens from a Single Hidden State | |
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers | |
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding | |
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs | |
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities | |
Iris: Breaking GUI Complexity with Adaptive Focus and Self-Refining | |
AdvPrefix: An Objective for Nuanced LLM Jailbreaks | |
VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation | |
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models | |
Llama 3 Meets MoE: Efficient Upcycling | |
Memory Layers at Scale | |
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers | |
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation | |
Memory Transformer | |
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture | |
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models | |
Whisper-GPT: A Hybrid Representation Audio Large Language Model | |
Smaller Language Models Are Better Instruction Evolvers | |
Investigating Mixture of Experts in Dense Retrieval | |
Let your LLM generate a few tokens and you will reduce the need for retrieval | |
RecSys Arena: Pair-wise Recommender System Evaluation with Large Language Models | |
No More Adam: Learning Rate Scaling at Initialization is All You Need | |
Entropy-Regularized Process Reward Model | |
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator | |
The Open Source Advantage in Large Language Models (LLMs) | |
GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training | |
Understanding Knowledge Hijack Mechanism in In-context Learning through Associative Memory | |
Superhuman performance of a large language model on the reasoning tasks of a physician | |
Reinforcement Learning Enhanced LLMs: A Survey | |
Byte Latent Transformer: Patches Scale Better Than Tokens | |
TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back) using Taylor-Softmax | |
Continual Pre-Training of Large Language Models: How to (re)warm your model? | |
Are Your LLMs Capable of Stable Reasoning? | |
An Agentic Approach to Automatic Creation of P&ID Diagrams from Natural Language Descriptions | |
Multi-Dimensional Insights: Benchmarking Real-World Personalization in Large Multimodal Models | |
AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark | |
Proposer-Agent-Evaluator(PAE): Autonomous Skill Discovery For Foundation Model Internet Agents | |
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain | |
EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation | |
When to Speak, When to Abstain: Contrastive Decoding with Abstention | |
RAG Playground: A Framework for Systematic Evaluation of Retrieval Strategies and Prompt Engineering in RAG Systems | |
Emergence of Abstractions: Concept Encoding and Decoding Mechanism for In-Context Learning in Transformers | |
Mastering Board Games by External and Internal Planning with Language Models | |
Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework | |
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation | |
Cultural Evolution of Cooperation among LLM Agents | |
Legommenders: A Comprehensive Content-Based Recommendation Library with LLM Support | |
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks | |
GUI Agents: A Survey | |
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment | |
LMUnit: Fine-grained Evaluation with Natural Language Unit Tests | |
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces | |
MetaMorph: Multimodal Understanding and Generation via Instruction Tuning | |
Alignment faking in large language models | |
CAD-Recode: Reverse Engineering CAD Code from Point Clouds | |
Prompting Strategies for Enabling Large Language Models to Infer Causation from Correlation | |
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer | |
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN | |
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge | |
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference | |
EscapeBench: Pushing Language Models to Think Outside the Box | |
FastVLM: Efficient Vision Encoding for Vision Language Models | |
LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks | |
Progressive Multimodal Reasoning via Active Retrieval | |
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval | |
Face the Facts! Evaluating RAG-based Fact-checking Pipelines in Realistic Settings | |
Sliding Windows Are Not the End: Exploring Full Ranking with Long-Context Large Language Models | |
Qwen2.5 Technical Report | |
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling | |
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing | |
How to Synthesize Text Data without Model Collapse? | |
TOMG-Bench: Evaluating LLMs on Text-based Open Molecule Generation | |
A Closer Look at the Limitations of Instruction Tuning | |
Rethinking Uncertainty Estimation in Natural Language Generation | |
Knowledge Injection via Prompt Distillation | |
HashAttention: Semantic Sparsity for Faster Inference | |
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective | |
SWAN: Preprocessing SGD Enables Adam-Level Performance On LLM Training With Significant Memory Reduction | |
BA-LoRA: Bias-Alleviating Low-Rank Adaptation to Mitigate Catastrophic Inheritance in Large Language Models | |
A Survey of Mathematical Reasoning in the Era of Multimodal Large Language Model: Benchmark, Method & Challenges | |
Do Large Language Models Defend Inferentialist Semantics?: On the Logical Expressivism and Anti-Representationalism of LLMs | |
Large Language Model Enhanced Recommender Systems: Taxonomy, Trend, Application and Future | |
Explainable Procedural Mistake Detection | |
SCOPE: Optimizing Key-Value Cache Compression in Long-context Generation | |
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding | |
Offline Reinforcement Learning for LLM Multi-Step Reasoning | |
Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks | |
XRAG: eXamining the Core -- Benchmarking Foundational Components in Advanced Retrieval-Augmented Generation | |
Fietje: An open, efficient LLM for Dutch | |
SimGRAG: Leveraging Similar Subgraphs for Knowledge Graphs Driven Retrieval-Augmented Generation | |
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design | |
PruneVid: Visual Token Pruning for Efficient Video Large Language Models | |
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps | |
HREF: Human Response-Guided Evaluation of Instruction Following in Language Models | |
Multi-LLM Text Summarization | |
Inference-Aware Fine-Tuning for Best-of-N Sampling in Large Language Models | |
Diving into Self-Evolving Training for Multimodal Reasoning | |
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners | |
Revisiting In-Context Learning with Long Context Language Models | |
NILE: Internal Consistency Alignment in Large Language Models | |
Formal Mathematical Reasoning: A New Frontier in AI | |
A Systematic Examination of Preference Learning through the Lens of Instruction-Following | |
WebLLM: A High-Performance In-Browser LLM Inference Engine | |
Collaborative Gym: A Framework for Enabling and Evaluating Human-Agent Collaboration | |
MathSpeech: Leveraging Small LMs for Accurate Conversion in Mathematical Speech-to-Formula | |
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage | |
Maximize Your Data's Potential: Enhancing LLM Accuracy with Two-Phase Pretraining | |
Associative memory inspires improvements for in-context learning using a novel attention residual stream architecture | |
RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response | |
Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization | |
Deliberation in Latent Space via Differentiable Cache Augmentation | |
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought | |
Efficient fine-tuning methodology of text embedding models for information retrieval: contrastive learning penalty (clp) | |
The HalluRAG Dataset: Detecting Closed-Domain Hallucinations in RAG Applications Using an LLM's Internal States | |
GME: Improving Universal Multimodal Retrieval by Multimodal LLMs | |
OpenAI o1 System Card | |
Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval | |
LearnLM: Improving Gemini for Learning | |
Outcome-Refining Process Supervision for Code Generation | |
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World | |
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning | |
Agent-SafetyBench: Evaluating the Safety of LLM Agents | |
ResearchTown: Simulator of Human Research Community | |
Rate of Model Collapse in Recursive Training | |
Cross-Lingual Text-Rich Visual Comprehension: An Information Theory Perspective | |
YuLan-Mini: An Open Data-efficient Language Model | |
A Survey of Query Optimization in Large Language Models | |
SKETCH: Structured Knowledge Enhanced Text Comprehension for Holistic Retrieval | |
Harnessing Large Language Models for Knowledge Graph Question Answering via Adaptive Multi-Aspect Retrieval-Augmentation | |
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization | |
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding | |
In Case You Missed It: ARC 'Challenge' Is Not That Challenging | |
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning | |
Improving Factuality with Explicit Working Memory | |
Efficient Long Context Language Model Retrieval with Compression | |
Token-Budget-Aware LLM Reasoning | |
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search | |
Fooling LLM graders into giving better grades through neural activity guided adversarial prompting | |
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation | |
A Silver Bullet or a Compromise for Full Attention? A Comprehensive Study of Gist Token-based Context Compression | |
Gradient Weight-normalized Low-rank Projection for Efficient LLM Training | |
Instruction Fine-Tuning: Does Prompt Loss Matter? | |
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment | |
RecLM: Recommendation Instruction Tuning | |
Multi-matrix Factorization Attention | |
Jasper and Stella: distillation of SOTA embedding models | |
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs | |
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation | |
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs | |
Facilitating large language model Russian adaptation with Learned Embedding Propagation | |
Training Software Engineering Agents and Verifiers with SWE-Gym | |
Efficiently Serving LLM Reasoning Programs with Certaindex | |
GASLITEing the Retrieval: Exploring Vulnerabilities in Dense Embedding-based Search | |
On the Compositional Generalization of Multimodal LLMs for Medical Imaging | |
OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System | |
InfAlign: Inference-aware language model alignment | |
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging | |
Dynamic Skill Adaptation for Large Language Models | |
Long-Range Tasks Using Short-Context LLMs: Incremental Reasoning With Structured Memories | |
CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era | |
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey | |
In-context Continual Learning Assisted by an External Continual Learner | |
LLM2: Let Large Language Models Harness System 2 Reasoning | |
Precise Length Control in Large Language Models | |
Can LLMs Convert Graphs to Text-Attributed Graphs? | |
Using Generative AI and Multi-Agents to Provide Automatic Feedback | |
Agents Are Not Enough | |
Smoothie: Label Free Language Model Routing | |
Xmodel-2 Technical Report | |
LLM-as-an-Interviewer: Beyond Static Testing Through Dynamic LLM Evaluation | |
HUNYUANPROVER: A Scalable Data Synthesis Framework and Guided Tree Search for Automated Theorem Proving | |
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios | |
ProgCo: Program Helps Self-Correction of Large Language Models | |
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings | |
A3: Android Agent Arena for Mobile GUI Agents | |
Dynamic Scaling of Unit Tests for Code Reward Modeling | |
KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model | |
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM | |
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation Models | |
MLLM-as-a-Judge for Image Safety without Human Labeling | |
MapQaTor: A System for Efficient Annotation of Map Query Datasets | |
Are Vision-Language Models Truly Understanding Multi-vision Sensor? | |
BitDistiller: Unleashing the Potential of Sub-4-Bit LLMs via Self-Distillation | |
TrustRAG: Enhancing Robustness and Trustworthiness in RAG | |
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding | |
RAG-Instruct: Boosting LLMs with Diverse Retrieval-Augmented Instructions | |
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | |
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation | |
Titans: Learning to Memorize at Test Time | |
Unifying Specialized Visual Encoders for Video Language Models | |
IGC: Integrating a Gated Calculator into an LLM to Solve Arithmetic Tasks Reliably and Efficiently | |
Low-Rank Adaptation for Foundation Models: A Comprehensive Review | |
Speech Recognition With LLMs Adapted to Disordered Speech Using Reinforcement Learning | |
Can LLMs Design Good Questions Based on Context? | |
2 OLMo 2 Furious | |
RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking | |
LUSIFER: Language Universal Space Integration for Enhanced Multilingual Embeddings with Large Language Models | |
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction | |
Enhancing Human-Like Responses in Large Language Models | |
Metadata Conditioning Accelerates Language Model Pre-training | |
Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap | |
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM | |
QuArch: A Question-Answering Dataset for AI Agents in Computer Architecture | |
SDPO: Segment-Level Direct Preference Optimization for Social Agents | |
Predicting the Performance of Black-box LLMs through Self-Queries | |
Many of Your DPOs are Secretly One: Attempting Unification Through Mutual Information | |
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery | |
Reinforcing Thinking through Reasoning-Enhanced Reward Models | |
ICLR: In-Context Learning of Representations | |
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning | |
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction | |
HALO: Hadamard-Assisted Lossless Optimization for Efficient Low-Precision LLM Training and Fine-Tuning | |
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use | |
Test-time Computing: from System-1 Thinking to System-2 Thinking | |
Scaling Laws for Floating Point Quantization Training | |
Benchmark Evaluations, Applications, and Challenges of Large Vision Language Models: A Survey | |
Personalized Graph-Based Retrieval for Large Language Models | |
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models | |
Understanding How CodeLLMs (Mis)Predict Types with Activation Steering | |
Instruction-Following Pruning for Large Language Models | |
GeAR: Generation Augmented Retrieval | |
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos | |
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token | |
MTRAG: A Multi-Turn Conversational Benchmark for Evaluating Retrieval-Augmented Generation Systems | |
PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models | |
DeepSeek-V3 Technical Report | |
Dataset Decomposition: Faster LLM Training with Variable Sequence Length Curriculum | |
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models | |
Segmenting Text and Learning Their Rewards for Improved RLHF in Language Model | |
Graph-Aware Isomorphic Attention for Adaptive Dynamics in Transformers | |
OpenOmni: Large Language Models Pivot Zero-shot Omnimodal Alignment across Language with Real-time Self-Aware Emotional Speech Synthesis | |
Entropy-Guided Attention for Private LLMs | |
Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though | |
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection | |
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking | |
Agent Laboratory: Using LLM Agents as Research Assistants | |
URSA: Understanding and Verifying Chain-of-thought Reasoning in Multimodal Mathematics | |
LLM4SR: A Survey on Large Language Models for Scientific Research | |
Repository Structure-Aware Training Makes SLMs Better Issue Resolver | |
DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization | |
EpiCoder: Encompassing Diversity and Complexity in Code Generation | |
Multi-task retriever fine-tuning for domain-specific and efficient RAG | |
Reasoning-Enhanced Self-Training for Long-Form Personalized Text Generation | |
Who Does the Giant Number Pile Like Best: Analyzing Fairness in Hiring Contexts | |
Search-o1: Agentic Search-Enhanced Large Reasoning Models | |
ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding | |
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model | |
SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution | |
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study | |
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models | |
Concept Boundary Vectors | |
A Survey of RWKV | |
Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning | |
Large-scale Group Brainstorming using Conversational Swarm Intelligence (CSI) versus Traditional Chat | |
Large Language Model is Secretly a Protein Sequence Optimizer | |
Experience of Training a 1.7B-Parameter LLaMa Model From Scratch | |
Frontier AI systems have surpassed the self-replicating red line | |
Large Language Models show both individual and collective creativity comparable to humans | |
Does Self-Attention Need Separate Weights in Transformers? | |
VideoRAG: Retrieval-Augmented Generation over Video Corpus | |
A Survey on Large Language Models with some Insights on their Capabilities and Limitations | |
SUGAR: Leveraging Contextual Confidence for Smarter Retrieval | |
On The Origin of Cultural Biases in Language Models: From Pre-training Data to Linguistic Phenomena | |
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs | |
Valley2: Exploring Multimodal Models with Scalable Vision-Language Design | |
Small Language Models (SLMs) Can Still Pack a Punch: A survey | |
Enabling Scalable Oversight via Self-Evolving Critic | |
Multiagent Finetuning: Self Improvement with Diverse Reasoning Chains | |
Infecting Generative AI With Viruses | |
OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding? | |
The Future of AI: Exploring the Potential of Large Concept Models | |
Demystifying Domain-adaptive Post-training for Financial LLMs | |
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models | |
WebWalker: Benchmarking LLMs in Web Traversal | |
Imagine while Reasoning in Space: Multimodal Visualization-of-Thought | |
Enhancing Retrieval-Augmented Generation: A Study of Best Practices | |
Foundations of Large Language Models | |
The Lessons of Developing Process Reward Models in Mathematical Reasoning | |
ListConRanker: A Contrastive Text Reranker with Listwise Encoding | |
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training | |
MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation | |
Towards Best Practices for Open Datasets for LLM Training | |
ChemAgent: Self-updating Library in Large Language Models Improves Chemical Reasoning | |
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning | |
Tensor Product Attention Is All You Need | |
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction | |
$\text{Transformer}^2$: Self-adaptive LLMs | |
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning | |
Amortizing intractable inference in large language models | |
PokerBench: Training Large Language Models to become Professional Poker Players | |
MiniMax-01: Scaling Foundation Models with Lightning Attention | |
HALoGEN: Fantastic LLM Hallucinations and Where to Find Them | |
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for LLM Training | |
A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following | |
Potential and Perils of Large Language Models as Judges of Unstructured Textual Data | |
Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding | |
Adaptive Semantic Prompt Caching with VectorQ | |
How GPT learns layer by layer | |
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models | |
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks | |
Enhancing Automated Interpretability with Output-Centric Feature Descriptions | |
From LLM to Conversational Agent: A Memory Enhanced Architecture with Fine-Tuning of Large Language Models | |
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models | |
Fast Inference of Mixture-of-Experts Language Models with Offloading | |
In-situ graph reasoning and knowledge expansion using Graph-PReFLexOR | |
Entailed Between the Lines: Incorporating Implication into NLI | |
Multimodal LLMs Can Reason about Aesthetics in Zero-Shot | |
OmniThink: Expanding Knowledge Boundaries in Machine Writing through Thinking | |
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models | |
Exploring the Inquiry-Diagnosis Relationship with Advanced Patient Simulators | |
Guiding Retrieval using LLM-based Listwise Rankers | |
Aligning Instruction Tuning with Pre-training | |
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG | |
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation | |
The Heap: A Contamination-Free Multilingual Code Dataset for Evaluating Large Language Models | |
Vision-Language Models Do Not Understand Negation | |
Task Vectors in In-Context Learning: Emergence, Formation, and Benefit | |
CodeXEmbed: A Generalist Embedding Model Family for Multiligual and Multi-task Code Retrieval | |
ComplexFuncBench: Exploring Multi-Step and Constrained Function Calling under Long-Context Scenario | |
PaSa: An LLM Agent for Comprehensive Academic Paper Search | |
Evolving Deeper LLM Thinking | |
Bridging Language Barriers in Healthcare: A Study on Arabic LLMs | |
SEAL: Entangled White-box Watermarks on Low-Rank Adaptation | |
Multiple Choice Questions: Reasoning Makes Large Language Models (LLMs) More Self-Confident Even When They Are Wrong | |
LatteReview: A Multi-Agent Framework for Systematic Review Automation Using Large Language Models | |
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding | |
Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement | |
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding | |
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models | |
Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks | |
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training | |
Reasoning Language Models: A Blueprint | |
Learn-by-interact: A Data-Centric Framework for Self-Adaptive Agents in Realistic Environments | |
Computational Protein Science in the Era of Large Language Models (LLMs) | |
FRAG: A Flexible Modular Framework for Retrieval-Augmented Generation based on Knowledge Graphs | |
SOP-Agent: Empower General Purpose AI Agent with Domain-Specific SOPs | |
PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play | |
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs | |
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback | |
4bit-Quantization in Vector-Embedding for RAG | |
AGENT-CQ: Automatic Generation and Evaluation of Clarifying Questions for Conversational Search with LLMs | |
MSTS: A Multimodal Safety Test Suite for Vision-Language Models | |
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model | |
Fixing Imbalanced Attention to Mitigate In-Context Hallucination of Large Vision-Language Model | |
Panoramic Interests: Stylistic-Content Aware Personalized Headline Generation | |
Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective | |
The Geometry of Tokens in Internal Representations of Large Language Models | |
Autonomy-of-Experts Models | |
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | |
FilmAgent: A Multi-Agent Framework for End-to-End Film Automation in Virtual 3D Spaces | |
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback | |
Kimi k1.5: Scaling Reinforcement Learning with LLMs | |
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning | |
Optimizing Pretraining Data Mixtures with LLM-Estimated Utility | |
Tell me about yourself: LLMs are aware of their learned behaviors | |
IntellAgent: A Multi-Agent Framework for Evaluating Conversational AI Systems | |
Pairwise RM: Perform Best-of-N Sampling with Knockout Tournament | |
Distillation Quantification for Large Language Models | |
FOCUS: First Order Concentrated Updating Scheme | |
Advancing Language Model Reasoning through Reinforcement Learning and Inference Scaling | |
FLAME: A small language model for spreadsheet formulas | |
Temporal Preference Optimization for Long-Form Video Understanding | |
Parameter-Efficient Fine-Tuning for Foundation Models | |
Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models | |
Can Large Language Models Understand Preferences in Personalized Recommendation? | |
RAG-Reward: Optimizing RAG with Reward Modeling and RLHF | |
Analyzing Continuous Semantic Shifts with Diachronic Word Similarity Matrices | |
Debate Helps Weak-to-Strong Generalization | |
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models | |
Hallucinations Can Improve Large Language Models in Drug Discovery | |
EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents | |
Do LLMs Provide Consistent Answers to Health-Related Questions across Languages? | |
Control LLM: Controlled Evolution for Intelligence Retention in LLM | |
Bilinear MLPs enable weight-based mechanistic interpretability | |
A Survey on Memory-Efficient Large-Scale Model Training in AI for Science | |
Synthetic Data Can Mislead Evaluations: Membership Inference as Machine Text Detection | |
Computing Optimization-Based Prompt Injections Against Closed-Weights Models By Misusing a Fine-Tuning API | |
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques | |
Chain-of-Retrieval Augmented Generation | |
OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting | |
A Survey of Graph Retrieval-Augmented Generation for Customized Large Language Models | |
Chat3GPP: An Open-Source Retrieval-Augmented Generation Framework for 3GPP Documents | |
Redundancy Principles for MLLMs Benchmarks | |
RL + Transformer = A General-Purpose Problem Solver | |
GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing | |
Question Answering on Patient Medical Records with Private Fine-Tuned LLMs | |
Return of the Encoder: Maximizing Parameter Efficiency for SLMs | |
Provence: efficient and robust context pruning for retrieval-augmented generation | |
ARWKV: Pretrain is not what we need, an RNN-Attention-Based Language Model Born from Transformer | |
OpenCharacter: Training Customizable Role-Playing LLMs with Large-Scale Synthetic Personas | |
Qwen2.5-1M Technical Report | |
GaussMark: A Practical Approach for Structural Watermarking of Language Models | |
Baichuan-Omni-1.5 Technical Report | |
ASRank: Zero-Shot Re-Ranking with Answer Scent for Document Retrieval | |
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning | |
CodeMonkeys: Scaling Test-Time Compute for Software Engineering | |
Challenges in Ensuring AI Safety in DeepSeek-R1 Models: The Shortcomings of Reinforcement Learning Strategies | |
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training | |
Optimizing Large Language Model Training Using FP4 Quantization | |
Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling | |
Open Problems in Mechanistic Interpretability | |
Low-Rank Adapters Meet Neural Architecture Search for LLM Compression | |
You Do Not Fully Utilize Transformer's Representation Capacity | |
Training Dynamics of In-Context Learning in Linear Attention | |
IndicMMLU-Pro: Benchmarking Indic Large Language Models on Multi-Task Language Understanding | |
StagFormer: Time Staggering Transformer Decoding for RunningLayers In Parallel | |
Self-reflecting Large Language Models: A Hegelian Dialectical Approach | |
Histoires Morales: A French Dataset for Assessing Moral Alignment | |
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models | |
Can Transformers Learn Full Bayesian Inference in Context? | |
Sparse Autoencoders Trained on the Same Data Learn Different Features | |
FBQuant: FeedBack Quantization for Large Language Models | |
RotateKV: Accurate and Robust 2-Bit KV Cache Quantization for LLMs via Outlier-Aware Adaptive Rotations | |
DeepFlow: Serverless Large Language Model Serving at Scale | |
Can LLM Generate Regression Tests for Software Commits? | |
Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models | |
WARP: An Efficient Engine for Multi-Vector Retrieval | |
Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation | |
Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate | |
Virus: Harmful Fine-tuning Attack for Large Language Models Bypassing Guardrail Moderation | |
Atla Selene Mini: A General Purpose Evaluation Model | |
Exploring the sustainable scaling of AI dilemma: A projective study of corporations' AI environmental impacts | |
Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs | |
GuardReasoner: Towards Reasoning-based LLM Safeguards | |
LLMs can see and hear without any training | |
Large Language Models Think Too Fast To Explore Effectively | |
AdditiveLLM: Large Language Models Predict Defects in Additive Manufacturing | |
Actions Speak Louder than Words: Agent Decisions Reveal Implicit Biases in Language Models | |
People who frequently use ChatGPT for writing tasks are accurate and robust detectors of AI-generated text | |
Streaming DiLoCo with overlapping communication: Towards a Distributed Free Lunch | |
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in Post-Training | |
o3-mini vs DeepSeek-R1: Which One is Safer? | |
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding | |
Beyond Turn-taking: Introducing Text-based Overlap into Human-LLM Interactions | |
s1: Simple test-time scaling | |
Trading Inference-Time Compute for Adversarial Robustness | |
R.I.P.: Better Models by Survival of the Fittest Prompts | |
Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge | |
Improving Your Model Ranking on Chatbot Arena by Vote Rigging | |
Sparse Autoencoders Can Interpret Randomly Initialized Transformers | |
LLM-AutoDiff: Auto-Differentiate Any LLM Workflow | |
Propositional Interpretability in Artificial Intelligence | |
Function Vectors in Large Language Models | |
Do LLMs Strategically Reveal, Conceal, and Infer Information? A Theoretical and Empirical Analysis in The Chameleon Game | |
Cache Me If You Must: Adaptive Key-Value Quantization for Large Language Models | |
Reward-Guided Speculative Decoding for Efficient LLM Reasoning | |
mFollowIR: a Multilingual Benchmark for Instruction Following in Retrieval | |
Efficient Reasoning with Hidden Thinking | |
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming | |
Unraveling the Capabilities of Language Models in News Summarization | |
Self-supervised Quantized Representation for Seamlessly Integrating Knowledge Graphs with Large Language Models | |
Diverse Preference Optimization | |
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization | |
LLMs Can Plan Only If We Tell Them | |
Scalable-Softmax Is Superior for Attention | |
An introduction to graphical tensor notation for mechanistic interpretability | |
PixelWorld: Towards Perceiving Everything as Pixels | |
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training | |
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient Long-Context LLM Inference | |
Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities | |
Towards Safe and Honest AI Agents with Neural Self-Other Overlap | |
Lifelong Sequential Knowledge Editing without Model Degradation | |
Preference Leakage: A Contamination Problem in LLM-as-a-judge | |
Process Reinforcement through Implicit Rewards | |
GFM-RAG: Graph Foundation Model for Retrieval Augmented Generation | |
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models | |
ZebraLogic: On the Scaling Limits of LLMs for Logical Reasoning | |
The Jumping Reasoning Curve? Tracking the Evolution of Reasoning Performance in GPT-[n] and o-[n] Models on Multimodal Puzzles | |
FastKV: KV Cache Compression for Fast Long-Context Processing with Token-Selective Propagation | |
HintEval: A Comprehensive Framework for Hint Generation and Evaluation for Questions | |
RankFlow: A Multi-Role Collaborative Reranking Workflow Utilizing Large Language Models | |
AIN: The Arabic INclusive Large Multimodal Model | |
Querying Databases with Function Calling | |
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models | |
The Differences Between Direct Alignment Algorithms are a Blur | |
Almost Surely Safe Alignment of Large Language Models at Inference-Time | |
Beyond Limited Data: Self-play LLM Theorem Provers with Iterative Conjecturing and Proving | |
Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment | |
SafeRAG: Benchmarking Security in Retrieval-Augmented Generation of Large Language Model | |
Serving Long-Context LLMs at the Mobile Edge: Test-Time Reinforcement Learning-based Model Caching and Inference Offloading | |
AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Understanding | |
MM-IQ: Benchmarking Human-Like Abstraction and Reasoning in Multimodal Models | |
Learning to Generate Unit Tests for Automated Debugging | |
Language Models Prefer What They Know: Relative Confidence Estimation via Confidence Preferences | |
Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search | |
LongDPO: Unlock Better Long-form Generation Abilities for LLMs via Critique-augmented Stepwise Information | |
RandLoRA: Full-rank parameter-efficient fine-tuning of large models | |
Fundamental limits of learning in sequence multi-index models and deep attention networks: High-dimensional asymptotics and sharp thresholds | |
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping | |
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search | |
Learning the RoPEs: Better 2D and 3D Position Encodings with STRING | |
Can LLMs Maintain Fundamental Abilities under KV Cache Compression? | |
ACECODER: Acing Coder RL via Automated Test-Case Synthesis | |
Harmonic Loss Trains Interpretable AI Models | |
Converting MLPs into Polynomials in Closed Form | |
Language Models Use Trigonometry to Do Addition | |
MMTEB: Massive Multilingual Text Embedding Benchmark | |
Rethinking Mixture-of-Agents: Is Mixing Different Large Language Models Beneficial? | |
Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification | |
TensorLLM: Tensorising Multi-Head Attention for Enhanced Reasoning and Compression in LLMs | |
BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving | |
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model | |
LIMO: Less is More for Reasoning | |
Demystifying Long Chain-of-Thought Reasoning in LLMs | |
Intent Representation Learning with Large Language Model for Recommendation | |
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning | |
Jailbreaking with Universal Multi-Prompts | |
Analyzing Similarity Metrics for Data Selection for Language Model Pretraining | |
Boosting Multimodal Reasoning with MCTS-Automated Structured Thinking | |
Wavelet-based Positional Representation for Long Context | |
A Probabilistic Inference Approach to Inference-Time Scaling of LLMs using Particle-Based Monte Carlo Methods | |
ScoreFlow: Mastering LLM Agent Workflows via Score-based Preference Optimization | |
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges | |
TwinMarket: A Scalable Behavioral and Social Simulation for Financial Markets | |
Activation Approximations Can Incur Safety Vulnerabilities Even in Aligned LLMs: Comprehensive Analysis and Defense | |
Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models | |
Text-to-CAD Generation Through Infusing Visual Feedback in Large Language Models | |
Deriving Activation Functions Using Integration | |
FlowLLM: Flow Matching for Material Generation with Large Language Models as Base Distributions | |
Large Language Model Guided Self-Debugging Code Generation | |
OmniRL: In-Context Reinforcement Learning by Large-Scale Meta-Training in Randomized Worlds | |
On Teacher Hacking in Language Model Distillation | |
Riddle Me This! Stealthy Membership Inference for Retrieval-Augmented Generation | |
Activation-Informed Merging of Large Language Models | |
HackerRank-ASTRA: Evaluating Correctness & Consistency of Large Language Models on cross-domain multi-file project problems | |
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment | |
Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions | |
Beyond Prompt Content: Enhancing LLM Performance via Content-Format Integrated Prompt Optimization | |
PILAF: Optimal Human Preference Sampling for Reward Modeling | |
MAGA: MAssive Genre-Audience Reformulation to Pretraining Corpus Expansion | |
MRAMG-Bench: A BeyondText Benchmark for Multimodal Retrieval-Augmented Multimodal Generation | |
UltraIF: Advancing Instruction Following from the Wild | |
BOLT: Bootstrap Long Chain-of-Thought in Language Models without Distillation | |
Syntriever: How to Train Your Retriever with Synthetic Data from LLMs | |
Analyze Feature Flow to Enhance Interpretation and Steering in Language Models | |
Great Models Think Alike and this Undermines AI Oversight | |
ChartCitor: Multi-Agent Framework for Fine-Grained Chart Visual Attribution | |
PlotGen: Multi-Agent LLM-based Scientific Data Visualization via Multimodal Feedback | |
Enhancing Code Generation for Low-Resource Languages: No Silver Bullet | |
Memorize and Rank: Elevating Large Language Models for Clinical Diagnosis Prediction | |
LLM Alignment as Retriever Optimization: An Information Retrieval Perspective | |
Partially Rewriting a Transformer in Natural Language | |
MedRAX: Medical Reasoning Agent for Chest X-ray | |
Do Large Language Model Benchmarks Test Reliability? | |
Loss Functions and Operators Generated by f-Divergences | |
xJailbreak: Representation Space Guided Reinforcement Learning for Interpretable LLM Jailbreaking | |
QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation | |
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach | |
DuoGuard: A Two-Player RL-Driven Framework for Multilingual LLM Guardrails | |
Lost in Time: Clock and Calendar Understanding Challenges in Multimodal LLMs | |
QuEST: Stable Training of LLMs with 1-Bit Weights and Activations | |
Generating Symbolic World Models via Test-time Scaling of Large Language Models | |
ARR: Question Answering with Large Language Models via Analyzing, Retrieving, and Reasoning | |
Linear Correlation in LM's Compositional Generalization and Hallucination | |
When One LLM Drools, Multi-LLM Collaboration Rules | |
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference | |
MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot | |
Step Back to Leap Forward: Self-Backtracking for Boosting Reasoning of Language Models | |
MEETING DELEGATE: Benchmarking LLMs on Attending Meetings on Our Behalf | |
CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance | |
Multi-agent Architecture Search via Agentic Supernet | |
Advancing Reasoning in Large Language Models: Promising Methods and Approaches | |
CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning | |
ALU: Agentic LLM Unlearning | |
Training Language Models to Reason Efficiently | |
Sparse Autoencoders for Hypothesis Generation | |
It's All in The [MASK]: Simple Instruction-Tuning Enables BERT-like Masked Language Models As Generative Classifiers | |
SPARC: Subspace-Aware Prompt Adaptation for Robust Continual Learning in LLMs | |
EVEv2: Improved Baselines for Encoder-Free Vision-Language Models | |
Matryoshka Quantization | |
Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning | |
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates | |
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling | |
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM | |
LM2: Large Memory Models | |
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding | |
APE: Faster and Longer Context-Augmented Generation via Adaptive Parallel Encoding | |
Hypencoder: Hypernetworks for Information Retrieval | |
The Hidden Life of Tokens: Reducing Hallucination of Large Vision-Language Models via Visual Information Steering | |
Leveraging the true depth of LLMs | |
Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies | |
Develop AI Agents for System Engineering in Factorio | |
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders | |
Augmenting Self-attention with Persistent Memory | |
SynthDetoxM: Modern LLMs are Few-Shot Parallel Detoxification Data Annotators | |
MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents | |
Auditing Prompt Caching in Language Model APIs | |
FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning for Financial Trading | |
Goedel-Prover: A Frontier Model for Open-Source Automated Theorem Proving | |
Scaling Pre-training to One Hundred Billion Data for Vision Language Models | |
O1 Embedder: Let Retrievers Think Before Action | |
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon | |
LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters! | |
CodeI/O: Condensing Reasoning Patterns via Code Input-Output Prediction | |
TWICE: What Advantages Can Low-Resource Domain-Specific Embedding Model Bring? - A Case Study on Korea Financial Texts | |
Towards Trustworthy Retrieval Augmented Generation for Large Language Models: A Survey | |
Forbidden Science: Dual-Use AI Challenge Benchmark and Scientific Refusal Tests | |
Knowledge Graph-Guided Retrieval Augmented Generation | |
Competitive Programming with Large Reasoning Models | |
Solving the Content Gap in Roblox Game Recommendations: LLM-Based Profile Generation and Reranking | |
DeepCrossAttention: Supercharging Transformer Residual Connections | |
Towards Internet-Scale Training For Agents | |
Optimizing Temperature for Language Models with Multi-Sample Inference | |
On the Emergence of Thinking in LLMs I: Searching for the Right Intuition | |
Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training | |
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE | |
Confidence Improves Self-Consistency in LLMs | |
Teaching Language Models to Critique via Reinforcement Learning | |
Training Language Models for Social Deduction with Multi-Agent Reinforcement Learning | |
FactIR: A Real-World Zero-shot Open-Domain Retrieval Benchmark for Fact-Checking | |
The Curse of Depth in Large Language Models | |
CODESIM: Multi-Agent Code Generation and Problem Solving through Simulation-Driven Planning and Debugging | |
Graph-Based Vector Search: An Experimental Evaluation of the State-of-the-Art | |
DarwinLM: Evolutionary Structured Pruning of Large Language Models | |
Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension | |
PIM Is All You Need: A CXL-Enabled GPU-Free System for Large Language Model Inference | |
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More | |
DPO-Shift: Shifting the Distribution of Direct Preference Optimization | |
Bag of Tricks for Inference-time Computation of LLM Reasoning | |
Gemstones: A Model Suite for Multi-Faceted Scaling Laws | |
Expect the Unexpected: FailSafe Long Context QA for Finance | |
Enhancing Financial Time-Series Forecasting with Retrieval-Augmented Large Language Models | |
Distillation Scaling Laws | |
Transfer Learning of Tabular Data by Finetuning Large Language Models | |
CoS: Chain-of-Shot Prompting for Long Video Understanding | |
LLM Pretraining with Continuous Concepts | |
Fino1: On the Transferability of Reasoning Enhanced LLMs to Finance | |
TransMLA: Multi-head Latent Attention Is All You Need | |
Automated Capability Discovery via Model Self-Exploration | |
Harnessing Language's Fractal Geometry with Recursive Inference Scaling | |
When More is Less: Understanding Chain-of-Thought Length in LLMs | |
Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models | |
NoLiMa: Long-Context Evaluation Beyond Literal Matching | |
mmE5: Improving Multimodal Multilingual Embeddings via High-quality Synthetic Data | |
The MoE-Empowered Edge LLMs Deployment: Architecture, Challenges, and Opportunities | |
MetaSC: Test-Time Safety Specification Optimization for Language Models | |
LLM Modules: Knowledge Transfer from a Large to a Small Model using Enhanced Cross-Attention | |
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models | |
Language Models Can Teach Themselves to Program Better | |
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning | |
Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing | |
Reasoning-as-Logic-Units: Scaling Test-Time Reasoning in Large Language Models Through Logic Unit Alignment | |
PDE-Controller: LLMs for Autoformalization and Reasoning of PDEs | |
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs | |
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers | |
The Geometry of Prompting: Unveiling Distinct Mechanisms of Task Adaptation in Language Models | |
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models | |
CoT-Valve: Length-Compressible Chain-of-Thought Tuning | |
EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents | |
KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG | |
Logical Reasoning in Large Language Models: A Survey | |
An Open Recipe: Adapting Language-Specific LLMs to a Reasoning Model in One Day via Model Merging | |
Typhoon T1: An Open Thai Reasoning Model | |
Diversity Enhances an LLM's Performance in RAG and Long-context Task | |
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding | |
InfiniteHiP: Extending Language Model Context Up to 3 Million Tokens on a Single GPU | |
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency | |
Exploring the Potential of Encoder-free Architectures in 3D LMMs | |
SQuARE: Sequential Question Answering Reasoning Engine for Enhanced Chain-of-Thought in Large Language Models | |
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles | |
EnigmaEval: A Benchmark of Long Multimodal Reasoning Challenges | |
Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient | |
Mathematical Reasoning in Large Language Models: Assessing Logical and Arithmetic Errors across Wide Numerical Ranges | |
Escaping Collapse: The Strength of Weak Data for Large Language Model Training | |
CopySpec: Accelerating LLMs with Speculative Copy-and-Paste Without Compromising Quality | |
Human-LLM Coevolution: Evidence from Academic Writing | |
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity? | |
Mastering the Craft of Data Synthesis for CodeLLMs | |
Towards Semantic Versioning of Open Pre-trained Language Model Releases on Hugging Face | |
How Green are Neural Language Models? Analyzing Energy Consumption in Text Summarization Fine-tuning | |
Enabling Autoregressive Models to Fill In Masked Tokens | |
GENERator: A Long-Context Generative Genomic Foundation Model | |
Token-Hungry, Yet Precise: DeepSeek R1 Highlights the Need for Multi-Step Reasoning Over Speed in MATH | |
ToolFactory: Automating Tool Generation by Leveraging LLM to Understand REST API Documentations | |
MM-RLHF: The Next Step Forward in Multimodal LLM Alignment | |
Large Language Diffusion Models | |
Small Models, Big Impact: Efficient Corpus and Graph-Based Adaptation of Small Multilingual Language Models for Low-Resource Languages | |
V2V-LLM: Vehicle-to-Vehicle Cooperative Autonomous Driving with Multi-Modal Large Language Models | |
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? | |
LaRA: Benchmarking Retrieval-Augmented Generation and Long-Context LLMs - No Silver Bullet for LC or RAG Routing | |
KGGen: Extracting Knowledge Graphs from Plain Text with Language Models | |
Diverse Inference and Verification for Advanced Reasoning | |
FoNE: Precise Single-Token Number Embeddings via Fourier Features | |
ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models | |
Jailbreaking to Jailbreak | |
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models | |
We Can't Understand AI Using our Existing Vocabulary | |
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation | |
Scaling Test-Time Compute Without Verification or RL is Suboptimal | |
PhysReason: A Comprehensive Benchmark towards Physics-Based Reasoning | |
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o Under Data Scarsity | |
LIMR: Less is More for RL Scaling | |
Bitnet.cpp: Efficient Edge Inference for Ternary LLMs | |
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model | |
Language Complexity Measurement as a Noisy Zero-Shot Proxy for Evaluating LLM Performance | |
Large Language Models and Mathematical Reasoning Failures | |
SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL | |
Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents | |
ExaGPT: Example-Based Machine-Generated Text Detection for Human Interpretability | |
System Message Generation for User Preferences using Open-Source Models | |
Cuckoo: An IE Free Rider Hatched by Massive Nutrition in LLM's Nest | |
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training | |
ReLearn: Unlearning via Learning for Large Language Models | |
The Mirage of Model Editing: Revisiting Evaluation in the Wild | |
Is That Your Final Answer? Test-Time Scaling Improves Selective Question Answering | |
SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors | |
Dyve: Thinking Fast and Slow for Dynamic Process Verification | |
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems | |
CoLA: Compute-Efficient Pre-Training of LLMs via Low-Rank Activation | |
KernelBench: Can LLMs Write Efficient GPU Kernels? | |
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question Answering? | |
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs | |
Data Valuation using Neural Networks for Efficient Instruction Fine-Tuning | |
Show Me the Work: Fact-Checkers' Requirements for Explainable Automated Fact-Checking | |
CRANE: Reasoning with constrained LLM generation | |
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation | |
Can a Single Model Master Both Multi-turn Conversations and Tool Use? CALM: A Unified Conversational Agentic Language Model | |
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks | |
Self-Supervised Prompt Optimization | |
Self-Data Distillation for Recovering Quality in Pruned Large Language Models | |
HopRAG: Multi-Hop Reasoning for Logic-Aware Retrieval-Augmented Generation | |
Rethinking Diverse Human Preference Learning through Principal Component Analysis | |
Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs | |
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading | |
Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge | |
Autellix: An Efficient Serving Engine for LLM Agents as General Programs | |
Why Safeguarded Ships Run Aground? Aligned Large Language Models' Safety Mechanisms Tend to Be Anchored in The Template Region | |
AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence | |
Qwen2.5-VL Technical Report | |
LongPO: Long Context Self-Evolution of Large Language Models through Short-to-Long Preference Optimization | |
Craw4LLM: Efficient Web Crawling for LLM Pretraining | |
Thinking Preference Optimization | |
Magma: A Foundation Model for Multimodal AI Agents | |
NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions | |
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation | |
Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity | |
Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks | |
LLM-Powered Proactive Data Systems | |
Flow-of-Options: Diversified and Improved LLM Reasoning by Thinking Through Options | |
Soundwave: Less is More for Speech-Text Alignment in LLMs | |
PAFT: Prompt-Agnostic Fine-Tuning | |
Concise Reasoning via Reinforcement Learning | |
Baichuan-M1: Pushing the Medical Capability of Large Language Models | |
Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research | |
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1 | |
Reasoning on a Spectrum: Aligning LLMs to System 1 and System 2 Thinking | |
SafeRoute: Adaptive Model Selection for Efficient and Accurate Safety Guardrails in Large Language Models | |
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities? | |
MUDDFormer: Breaking Residual Bottlenecks in Transformers via Multiway Dynamic Dense Connections | |
Small Models Struggle to Learn from Strong Reasoners | |
Scaling Autonomous Agents via Automatic Reward Modeling And Planning | |
Atom of Thoughts for Markov LLM Test-Time Scaling | |
OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning | |
FinMTEB: Finance Massive Text Embedding Benchmark | |
Multilingual Encoder Knows more than You Realize: Shared Weights Pretraining for Extremely Low-Resource Languages | |
Injecting Domain-Specific Knowledge into Large Language Models: A Comprehensive Survey | |
HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation | |
RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision | |
Proving Olympiad Inequalities by Synergizing LLMs and Symbolic Reasoning | |
GIMMICK -- Globally Inclusive Multimodal Multitask Cultural Knowledge Benchmarking | |
TrustRAG: An Information Assistant with Retrieval Augmented Generation | |
REFIND: Retrieval-Augmented Factuality Hallucination Detection in Large Language Models | |
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models | |
Learning to Reason at the Frontier of Learnability | |
Presumed Cultural Identity: How Names Shape LLM Responses | |
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning | |
StepTool: Enhancing Multi-Step Tool Usage in LLMs through Step-Grained Reinforcement Learning | |
Advancing Spatial Reasoning in Large Language Models: An In-Depth Evaluation and Enhancement Using the StepGame Benchmark | |
TESS 2: A Large-Scale Generalist Diffusion Language Model | |
Judging the Judges: A Collection of LLM-Generated Relevance Judgements | |
From Tools to Teammates: Evaluating LLMs in Multi-Session Coding Interactions | |
Reducing Hallucinations in Language Model-based SPARQL Query Generation Using Post-Generation Memory Retrieval | |
REALTALK: A 21-Day Real-World Dataset for Long-Term Conversation | |
AIDE: AI-Driven Exploration in the Space of Code | |
MVL-SIB: A Massively Multilingual Vision-Language Benchmark for Cross-Modal Topical Matching | |
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation | |
LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models | |
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features | |
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning | |
On the Influence of Context Size and Model Choice in Retrieval-Augmented Generation Systems | |
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines | |
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO | |
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above | |
MLGym: A New Framework and Benchmark for Advancing AI Research Agents | |
S*: Test Time Scaling for Code Generation | |
On the Trustworthiness of Generative Foundation Models: Guideline, Assessment, and Perspective | |
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC | |
Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information | |
Interpretable Text Embeddings and Text Similarity Explanation: A Primer | |
ETS: Efficient Tree Search for Inference-Time Scaling | |
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models | |
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? | |
Unstructured Evidence Attribution for Long Context Query Focused Summarization | |
Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems | |
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning | |
How Much Do LLMs Hallucinate across Languages? On Multilingual Estimation of LLM Hallucination in the Wild | |
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention | |
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback | |
CLIPPER: Compression enables long-context synthetic data generation | |
LLM-based User Profile Management for Recommender System | |
Multimodal RewardBench: Holistic Evaluation of Reward Models for Vision Language Models | |
Enhancing Cognition and Explainability of Multimodal Foundation Models with Self-Synthesized Data | |
The Imitation Game According To Turing | |
You Can't Eat Your Cake and Have It Too: The Performance Degradation of LLMs with Jailbreak Defense | |
Trojan Detection Through Pattern Recognition for Large Language Models | |
How to Get Your LLM to Generate Challenging Problems for Evaluation | |
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images | |
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs | |
RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression | |
Diversity-driven Data Selection for Language Model Tuning through Sparse Autoencoder | |
H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models, Including OpenAI o1/o3, DeepSeek-R1, and Gemini 2.0 Flash Thinking | |
The underlying structures of self-attention: symmetry, directionality, and emergent dynamics in Transformer training | |
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence | |
VERT: Verified Equivalent Rust Transpilation with Large Language Models as Few-Shot Learners | |
Large language models and (non-)linguistic recursion | |
MathConstruct: Challenging LLM Reasoning with Constructive Proofs | |
Enhancing Reasoning to Adapt Large Language Models for Domain-Specific Applications | |
Unveiling Simplicities of Attention: Adaptive Long-Context Head Identification | |
Automated Hypothesis Validation with Agentic Sequential Falsifications | |
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer | |
LightThinker: Thinking Step-by-Step Compression | |
PIP-KAG: Mitigating Knowledge Conflicts in Knowledge-Augmented Generation via Parametric Pruning | |
UPCORE: Utility-Preserving Coreset Selection for Balanced Unlearning | |
More for Keys, Less for Values: Adaptive KV Cache Quantization | |
SIFT: Grounding LLM Reasoning in Contexts via Stickers | |
Think Inside the JSON: Reinforcement Strategy for Strict LLM Schema Adherence | |
PathRAG: Pruning Graph-based Retrieval Augmented Generation with Relational Paths | |
SurveyX: Academic Survey Automation via Large Language Models | |
StructFlowBench: A Structured Flow Benchmark for Multi-turn Instruction Following | |
Mol-LLaMA: Towards General Understanding of Molecules in Large Molecular Language Model | |
LLM-Microscope: Uncovering the Hidden Role of Punctuation in Context Memory of Transformers | |
KITAB-Bench: A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding | |
MoBA: Mixture of Block Attention for Long-Context LLMs | |
Evaluating Multimodal Generative AI with Korean Educational Standards | |
Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models | |
Beyond No: Quantifying AI Over-Refusal and Emotional Attachment Boundaries | |
Large Language Models for Cryptocurrency Transaction Analysis: A Bitcoin Case Study | |
mStyleDistance: Multilingual Style Embeddings and their Evaluation | |
Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis | |
MedHallu: A Comprehensive Benchmark for Detecting Medical Hallucinations in Large Language Models | |
Benchmarking LLMs for Political Science: A United Nations Perspective | |
Linguistic Generalizability of Test-Time Scaling in Mathematical Reasoning | |
Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models | |
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models | |
Thus Spake Long-Context Large Language Model | |
BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference | |
Benchmarking Temporal Reasoning and Alignment Across Chinese Dynasties | |
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment | |
Beyond Release: Access Considerations for Generative AI Systems | |
Audio-FLAN: A Preliminary Release | |
Towards Fully-Automated Materials Discovery via Large-Scale Synthesis Dataset and Expert-Level LLM-as-a-Judge | |
Linear Attention for Efficient Bidirectional Sequence Modeling | |
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models | |
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam | |
Empowering LLMs with Logical Reasoning: A Comprehensive Survey | |
Activation Steering in Neural Theorem Provers | |
Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease | |
PLDR-LLMs Learn A Generalizable Tensor Operator That Can Replace Its Own Deep Neural Net At Inference | |
PlanGEN: A Multi-Agent Framework for Generating Planning and Reasoning Trajectories for Complex Problem Solving | |
Reasoning with Latent Thoughts: On the Power of Looped Transformers | |
COSMOS: A Hybrid Adaptive Optimizer for Memory-Efficient Training of LLMs | |
Baichuan-Audio: A Unified Framework for End-to-End Speech Interaction | |
LettuceDetect: A Hallucination Detection Framework for RAG Applications | |
Mapping 1,000+ Language Models via the Log-Likelihood Vector | |
M3-AGIQA: Multimodal, Multi-Round, Multi-Aspect AI-Generated Image Quality Assessment | |
Grounded Persuasive Language Generation for Automated Marketing | |
Compression Scaling Laws:Unifying Sparsity and Quantization | |
Self-Taught Agentic Long Context Understanding | |
InductionBench: LLMs Fail in the Simplest Complexity Class | |
Towards an AI co-scientist | |
Investigating the Impact of Quantization Methods on the Safety and Reliability of Large Language Models | |
DRAMA: Diverse Augmentation from Large Language Models to Smaller Dense Retrievers | |
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution | |
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference | |
WebGames: Challenging General-Purpose Web-Browsing AI Agents | |
Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective | |
AAD-LLM: Neural Attention-Driven Auditory Scene Understanding | |
MutaGReP: Execution-Free Repository-Grounded Plan Search for Code-Use | |
Scale-Distribution Decoupling: Enabling Stable and Effective Training of Large Language Models | |
TheoremExplainAgent: Towards Multimodal Explanations for LLM Theorem Understanding | |
Can Large Language Models Detect Errors in Long Chain-of-Thought Reasoning? | |
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems | |
Kanana: Compute-efficient Bilingual Language Models | |
Plutus: Benchmarking Large Language Models in Low-Resource Greek Finance | |
WiCkeD: A Simple Method to Make Multiple Choice Benchmarks More Challenging | |
ViDoRAG: Visual Document Retrieval-Augmented Generation via Dynamic Iterative Reasoning Agents | |
Language Models' Factuality Depends on the Language of Inquiry | |
Scaling LLM Pre-training with Vocabulary Curriculum | |
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization | |
The Lottery LLM Hypothesis, Rethinking What Abilities Should LLM Compression Preserve? | |
ReaderLM-v2: Small Language Model for HTML to Markdown and JSON | |
Introducing Visual Perception Token into Multimodal Large Language Model | |
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs | |
Shakti-VLMs: Scalable Vision-Language Models for Enterprise AI | |
Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization | |
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement | |
Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents | |
LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models | |
Prompt-to-Leaderboard | |
Optimizing Model Selection for Compound AI Systems | |
Inner Thinking Transformer: Leveraging Dynamic Depth Scaling to Foster Adaptive Internal Thinking | |
MixMin: Finding Data Mixtures via Convex Minimization | |
Can Language Models Falsify? Evaluating Algorithmic Reasoning with Counterexample Creation | |
Project Alexandria: Towards Freeing Scientific Knowledge from Copyright Burdens via LLMs | |
CritiQ: Mining Data Quality Criteria from Human Preferences | |
Rank1: Test-Time Compute for Reranking in Information Retrieval | |
BIG-Bench Extra Hard | |
All That Glitters is Not Novel: Plagiarism in AI Generated Research | |
FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users | |
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization | |
Bi'an: A Bilingual Benchmark and Model for Hallucination Detection in Retrieval-Augmented Generation | |
KiRAG: Knowledge-Driven Iterative Retriever for Enhancing Retrieval-Augmented Generation | |
LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers | |
(Mis)Fitting: A Survey of Scaling Laws | |
Towards Optimal Multi-draft Speculative Decoding | |
Training a Generally Curious Agent | |
Reward Shaping to Mitigate Reward Hacking in RLHF | |
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts | |
SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning | |
LongRoPE2: Near-Lossless LLM Context Window Scaling | |
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge | |
Self-rewarding correction for mathematical reasoning | |
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving | |
Granite Embedding Models | |
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning | |
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations | |
Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices | |
Lean and Mean: Decoupled Value Policy Optimization with Global Value Guidance | |
CODESYNC: Synchronizing Large Language Models with Dynamic Code Evolution at Scale | |
NeoBERT: A Next-Generation BERT | |
An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs | |
On Relation-Specific Neurons in Large Language Models | |
Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System | |
Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners | |
Is Your Paper Being Reviewed by an LLM? A New Benchmark Dataset and Approach for Detecting AI Text in Peer Review | |
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases | |
Applications of Large Models in Medicine | |
Agent-centric Information Access | |
A Systematic Survey of Automatic Prompt Optimization Techniques | |
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs | |
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models | |
DeepSolution: Boosting Complex Engineering Solution Design via Tree-based Exploration and Bi-point Thinking | |
Retrieval Backward Attention without Additional Training: Enhance Embeddings of Large Language Models via Repetition | |
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers | |
Chain of Draft: Thinking Faster by Writing Less | |
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval | |
Multi-Turn Code Generation Through Single-Step Rewards | |
Preference Learning Unlocks LLMs' Psycho-Counseling Skills | |
Large-Scale Data Selection for Instruction Tuning | |
Visual-RFT: Visual Reinforcement Fine-Tuning | |
Retrieval Models Aren't Tool-Savvy: Benchmarking Tool Retrieval for Large Language Models | |
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs | |
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs | |
Predictive Data Selection: The Data That Predicts Is the Data That Teaches | |
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting | |
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test | |
U-NIAH: Unified RAG and LLM Evaluation for Long Context Needle-In-A-Haystack | |
CodeArena: A Collective Evaluation Platform for LLM Code Generation | |
CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments | |
Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia | |
DOVE: A Large-Scale Multi-Dimensional Predictions Dataset Towards Meaningful LLM Evaluation | |
Liger: Linearizing Large Language Models to Gated Recurrent Structures | |
AI-Invented Tonal Languages: Preventing a Machine Lingua Franca Beyond Human Understanding | |
From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens | |
SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity | |
When an LLM is apprehensive about its answers -- and when its uncertainty is justified | |
PodAgent: A Comprehensive Framework for Podcast Generation | |
General Reasoning Requires Learning to Reason from the Get-go | |
Efficient Test-Time Scaling via Self-Calibration | |
Why Are Web AI Agents More Vulnerable Than Standalone LLMs? A Security Analysis | |
Wikipedia in the Era of LLMs: Evolution and Risks | |
Language Models can Self-Improve at State-Value Estimation for Better Search | |
Mask-DPO: Generalizable Fine-grained Factuality Alignment of LLMs | |
MPO: Boosting LLM Agents with Meta Plan Optimization | |
Teaching Metric Distance to Autoregressive Multimodal Foundational Models | |
ATLaS: Agent Tuning via Learning Critical Steps | |
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents | |
PipeOffload: Improving Scalability of Pipeline Parallelism with Memory Optimization | |
Systems and Algorithms for Convolutional Multi-Hybrid Language Models at Scale | |
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | |
FR-Spec: Accelerating Large-Vocabulary Language Models via Frequency-Ranked Speculative Sampling | |
State Stream Transformer (SST) : Emergent Metacognitive Behaviours Through Latent State Persistence | |
DINT Transformer | |
Matryoshka Re-Ranker: A Flexible Re-Ranking Architecture With Configurable Depth and Width | |
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data? | |
FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models | |
Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models | |
Attention is All You Need Until You Need Retention | |
Complexity Control Facilitates Reasoning-Based Compositional Generalization in Transformers | |
FinerWeb-10BT: Refining Web Data with LLM-Based Line-Level Filtering | |
Shrink the longest: improving latent space isotropy with symplicial geometry | |
Mapping the Edge of Chaos: Fractal-Like Boundaries in The Trainability of Decoder-Only Transformer Models | |
Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention | |
Why Are Positional Encodings Nonessential for Deep Autoregressive Transformers? Revisiting a Petroglyph | |
Superposition in Transformers: A Novel Way of Building Mixture of Experts | |
Proactive Conversational Agents with Inner Thoughts | |
Chunk-Distilled Language Modeling | |
Transformer with Fourier Integral Attentions | |
AppAgentX: Evolving GUI Agents as Proficient Smartphone Users | |
IterPref: Focal Preference Learning for Code Generation via Iterative Debugging | |
Iterative Value Function Optimization for Guided Decoding | |
LADDER: Self-Improving LLMs Through Recursive Problem Decomposition | |
Q-Filters: Leveraging QK Geometry for Efficient KV Cache Compression | |
A Token-level Text Image Foundation Model for Document Understanding | |
(How) Do Language Models Track State? | |
Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks | |
Forgetting Transformer: Softmax Attention with a Forget Gate | |
Societal Alignment Frameworks Can Improve LLM Alignment | |
PowerAttention: Exponentially Scaling of Receptive Fields for Effective Sparse Attention | |
One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings | |
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding | |
MCiteBench: A Benchmark for Multimodal Citation Text Generation in MLLMs | |
IFIR: A Comprehensive Benchmark for Evaluating Instruction-Following in Expert-Domain Information Retrieval | |
Babel: Open Multilingual Large Language Models Serving Over 90% of Global Speakers | |
Interact, Instruct to Improve: A LLM-Driven Parallel Actor-Reasoner Framework for Enhancing Autonomous Vehicle Interactions | |
ABC: Achieving Better Control of Multimodal Embeddings using VLMs | |
Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions | |
CrowdSelect: Synthetic Instruction Data Selection with Multi-LLM Wisdom | |
Benchmarking Large Language Models for Multi-Language Software Vulnerability Detection | |
SwiLTra-Bench: The Swiss Legal Translation Benchmark | |
HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs | |
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective | |
Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining | |
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs | |
START: Self-taught Reasoner with Tools | |
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization | |
More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG | |
Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks | |
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion | |
Token-Efficient Long Video Understanding for Multimodal LLMs | |
PokéChamp: an Expert-level Minimax Language Agent | |
LLM as a Broken Telephone: Iterative Generation Distorts Information | |
Measuring Faithfulness of Chains of Thought by Unlearning Reasoning Steps | |
L$^2$M: Mutual Information Scaling Law for Long-Context Language Modeling | |
Multi Agent based Medical Assistant for Edge Devices | |
Lost in Literalism: How Supervised Training Shapes Translationese in LLMs | |
LINGOLY-TOO: Disentangling Memorisation from Reasoning with Linguistic Templatisation and Orthographic Obfuscation | |
Identifying Sensitive Weights via Post-quantization Integral | |
On the Acquisition of Shared Grammatical Representations in Bilingual Language Models | |
TRACT: Regression-Aware Fine-tuning Meets Chain-of-Thought Reasoning for LLM-as-a-Judge | |
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions | |
Enough Coin Flips Can Make LLMs Act Bayesian | |
Position: Don't use the CLT in LLM evals with fewer than a few hundred datapoints | |
All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning | |
From Language to Cognition: How LLMs Outgrow the Human Language Network | |
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning | |
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning | |
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive Cognitive-Inspired Sketching | |
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model | |
TinyR1-32B-Preview: Boosting Accuracy with Branch-Merge Distillation | |
Shifting Long-Context LLMs Research from Input to Output | |
Learning from Failures in Multi-Attempt Reinforcement Learning | |
EuroBERT: Scaling Multilingual Encoders for European Languages | |
LoRACode: LoRA Adapters for Code Embeddings | |
SAGE: A Framework of Precise Retrieval for RAG | |
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-Experts | |
An Empirical Study on Eliciting and Improving R1-like Reasoning Models | |
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information | |
LONGCODEU: Benchmarking Long-Context Language Models on Long Code Understanding | |
RuCCoD: Towards Automated ICD Coding in Russian | |
Know You First and Be You Better: Modeling Human-Like User Simulators via Implicit Profiles | |
Leveraging Domain Knowledge at Inference Time for LLM Translation: Retrieval versus Generation | |
A Little Depth Goes a Long Way: The Expressive Power of Log-Depth Transformers | |
Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing | |
G-Designer: Architecting Multi-agent Communication Topologies via Graph Neural Networks | |
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs | |
Power-Softmax: Towards Secure LLM Inference over Encrypted Data | |
SEAP: Training-free Sparse Expert Activation Pruning Unlock the Brainpower of Large Language Models | |
Detection Avoidance Techniques for Large Language Models | |
MM-Eureka: Exploring Visual Aha Moment with Rule-based Large-scale Reinforcement Learning | |
Automated Movie Generation via Multi-Agent CoT Planning | |
Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning | |
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models | |
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation | |
Agent models: Internalizing Chain-of-Action Generation into Reasoning models | |
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning | |
Words or Vision: Do Vision-Language Models Have Blind Faith in Text? | |
Should VLMs be Pre-trained with Image Data? | |
GRITHopper: Decomposition-Free Multi-Hop Dense Retrieval | |
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning | |
RePO: ReLU-based Preference Optimization | |
ProBench: Judging Multimodal Foundation Models on Open-ended Multi-domain Expert Tasks | |
Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces | |
WritingBench: A Comprehensive Benchmark for Generative Writing | |
SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing | |
Feature-Level Insights into Artificial Text Detection with Sparse Autoencoders | |
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs | |
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations | |
This Is Your Doge, If It Please You: Exploring Deception and Robustness in Mixture of LLMs | |
Beyond RAG: Task-Aware KV Cache Compression for Comprehensive Knowledge Reasoning | |
Symbolic Mixture-of-Experts: Adaptive Skill-based Routing for Heterogeneous Reasoning | |
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries | |
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning | |
Rank-R1: Enhancing Reasoning in LLM-based Document Rerankers via Reinforcement Learning | |
Enhancing Reasoning with Collaboration and Memory | |
What I cannot execute, I do not understand: Training and Evaluating LLMs on Program Execution Traces | |
Perplexity Trap: PLM-Based Retrievers Overrate Low Perplexity Documents | |
YuE: Scaling Open Foundation Models for Long-Form Music Generation | |
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories | |
Gemini Embedding: Generalizable Embeddings from Gemini | |
Mixture of Experts Made Intrinsically Interpretable | |
Implicit Reasoning in Transformers is Reasoning through Shortcuts | |
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL | |
Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model | |
BiasEdit: Debiasing Stereotyped Language Models via Model Editing | |
AI-native Memory 2.0: Second Me | |
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation | |
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering | |
Exploiting Instruction-Following Retrievers for Malicious Information Retrieval | |
Robusto-1 Dataset: Comparing Humans and VLMs on real out-of-distribution Autonomous Driving VQA from Peru | |
Collapse of Dense Retrievers: Short, Early, and Literal Biases Outranking Factual Evidence | |
Capacity-Aware Inference: Mitigating the Straggler Effect in Mixture of Experts | |
LocAgent: Graph-Guided LLM Agents for Code Localization | |
PlainQAFact: Automatic Factuality Evaluation Metric for Biomedical Plain Language Summaries Generation | |
Confident Adaptive Language Modeling | |
SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability | |
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning | |
Cost-Optimal Grouped-Query Attention for Long-Context LLMs | |
Quantizing Large Language Models for Code Generation: A Differentiated Replication | |
WildIFEval: Instruction Following in the Wild | |
LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning | |
Self-Taught Self-Correction for Small Language Models | |
The First Few Tokens Are All You Need: An Efficient and Effective Unsupervised Prefix Fine-Tuning Method for Reasoning Models | |
How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach | |
Protein Large Language Models: A Comprehensive Survey | |
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs | |
Forecasting Rare Language Model Behaviors | |
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration | |
Query-Dependent Prompt Evaluation and Optimization with Offline Inverse RL | |
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search | |
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation | |
Transformers without Normalization | |
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization | |
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning | |
IteRABRe: Iterative Recovery-Aided Block Reduction | |
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond | |
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models | |
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo | |
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning | |
AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation | |
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention | |
MinorBench: A hand-built benchmark for content-based risks for children | |
A Frustratingly Simple Yet Highly Effective Attack Baseline: Over 90% Success Rate Against the Strong Black-box Models of GPT-4.5/4o/o1 | |
KV-Distill: Nearly Lossless Learnable Context Compression for LLMs | |
Medical Hallucinations in Foundation Models and Their Impact on Healthcare | |
KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation | |
Language Models Fail to Introspect About Their Knowledge of Language | |
Constructions are Revealed in Word Distributions | |
API Agents vs. GUI Agents: Divergence and Convergence | |
A Survey on Knowledge-Oriented Retrieval-Augmented Generation | |
Small Vision-Language Models: A Survey on Compact Architectures and Techniques | |
Can Large Reasoning Models do Analogical Reasoning under Perceptual Uncertainty? | |
Generative Modelling for Mathematical Discovery | |
Ordered Semantically Diverse Sampling for Textual Data | |
Semantic Wave Functions: Exploring Meaning in Large Language Models Through Quantum Formalism | |
Evaluation of the Automated Labeling Method for Taxonomic Nomenclature Through Prompt-Optimized Large Language Model | |
LimTopic: LLM-based Topic Modeling and Text Summarization for Analyzing Scientific Articles limitations | |
From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM | |
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges | |
ARMOR v0.1: Empowering Autoregressive Multimodal Understanding Model with Interleaved Multimodal Generation via Asymmetric Synergy | |
WorkflowLLM: Enhancing Workflow Orchestration Capability of Large Language Models | |
Unveiling the Secret Recipe: A Guide For Supervised Fine-Tuning Small LLMs | |
Why do language models perform worse for morphologically complex languages? | |
When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages | |
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling | |
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization | |
reWordBench: Benchmarking and Improving the Robustness of Reward Models with Transformed Inputs | |
RAGO: Systematic Performance Optimization for Retrieval-Augmented Generation Serving | |
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning | |
UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality | |
Identity Lock: Locking API Fine-tuned LLMs With Identity-based Wake Words | |
Basic Category Usage in Vision Language Models | |
Investigating Human-Aligned Large Language Model Uncertainty | |
A Review of DeepSeek Models' Key Innovative Techniques | |
Agents Play Thousands of 3D Video Games | |
Free-form language-based robotic reasoning and grasping | |
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey | |
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially? | |
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs | |
xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference | |
SuperBPE: Space Travel for Language Models | |
$φ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation | |
EXAONE Deep: Reasoning Enhanced Language Models | |
Visualizing Thought: Conceptual Diagrams Enable Robust Planning in LMMs | |
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models | |
Frac-Connections: Fractional Extension of Hyper-Connections | |
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models | |
MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification | |
Auditing language models for hidden objectives | |
Aligning Multimodal LLM with Human Preference: A Survey | |
Measuring AI Ability to Complete Long Tasks | |
Temporal Consistency for LLM Reasoning Process Error Identification | |
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts | |
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM | |
DAPO: An Open-Source LLM Reinforcement Learning System at Scale | |
RWKV-7 "Goose" with Expressive Dynamic State Evolution | |
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning | |
LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws | |
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding | |
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era | |
PENCIL: Long Thoughts with Short Memory | |
Pensez: Less Data, Better Reasoning -- Rethinking French LLM | |
MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs | |
Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs | |
Learning to Inference Adaptively for Multimodal Large Language Models | |
Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models | |
EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees | |
CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving | |
Reversal Blessing: Thinking Backward May Outpace Thinking Forward in Multi-choice Questions | |
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks | |
Optimizing Retrieval Strategies for Financial Question Answering Documents in Retrieval-Augmented Generation Systems | |
ELTEX: A Framework for Domain-Driven Synthetic Data Generation | |
Enhancing Code LLM Training with Programmer Attention | |
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning | |
ViSpeak: Visual Instruction Feedback in Streaming Videos | |
STEVE: AStep Verification Pipeline for Computer-use Agent Training | |
LEGION: Learning to Ground and Explain for Synthetic Image Detection | |
GKG-LLM: A Unified Framework for Generalized Knowledge Graph Construction | |
MetaLadder: Ascending Mathematical Solution Quality via Analogical-Problem Reasoning Transfer | |
VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity | |
SkyLadder: Better and Faster Pretraining via Context Window Scheduling | |
LLM-FE: Automated Feature Engineering for Tabular Data with LLMs as Evolutionary Optimizers | |
Inside-Out: Hidden Factual Knowledge in LLMs | |
The KoLMogorov Test: Compression by Code Generation | |
LLM-Mediated Guidance of MARL Systems | |
XAttention: Block Sparse Attention with Antidiagonal Scoring | |
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models | |
Survey on Evaluation of LLM-based Agents | |
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners | |
LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates | |
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language Models | |
Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning | |
Reinforcement Learning for Reasoning in Small LLMs: What Works and What Doesn't | |
MathFusion: Enhancing Mathematic Problem-solving of LLM through Instruction Fusion | |
CLS-RL: Image Classification with Rule-Based Reinforcement Learning | |
SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs | |
Mixture of Lookup Experts | |
FastCuRL: Curriculum Reinforcement Learning with Progressive Context Extension for Efficient Training R1-like Reasoning Models | |
Vision-Speech Models: Teaching Speech Models to Converse about Images | |
Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning | |
BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity? | |
Why Personalizing Deep Learning-Based Code Completion Tools Matters | |
Where do Large Vision-Language Models Look at when Answering Questions? | |
Why Do Multi-Agent LLM Systems Fail? | |
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos? | |
RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness | |
Causal Discovery from Data Assisted by Large Language Models | |
Modifying Large Language Model Post-Training for Diverse Creative Writing | |
Computation Mechanism Behind LLM Position Generalization | |
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules | |
Dancing with Critiques: Enhancing LLM Reasoning with Stepwise Natural Language Self-Critique | |
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement | |
Capturing Individual Human Preferences with Reward Features | |
PVChat: Personalized Video Chat with One-Shot Learning | |
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving | |
MARS: A Multi-Agent Framework Incorporating Socratic Guidance for Automated Prompt Optimization | |
MathFlow: Enhancing the Perceptual Flow of MLLMs for Visual Mathematical Problems | |
AlphaSpace: Enabling Robotic Actions through Semantic Tokenization and Symbolic Reasoning | |
GAEA: A Geolocation Aware Conversational Model | |
What Makes a Reward Model a Good Teacher? An Optimization Perspective | |
Can Large Vision Language Models Read Maps Like a Human? | |
From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration | |
Implicit Bias-Like Patterns in Reasoning Models | |
Thinking Machines: A Survey of LLM based Reasoning Strategies | |
StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs | |
A Comprehensive Survey on Long Context Language Modeling | |
Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation | |
ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization | |
Exploring Training and Inference Scaling Laws in Generative Retrieval | |
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training | |
Video SimpleQA: Towards Factuality Evaluation in Large Video Language Models | |
FFN Fusion: Rethinking Sequential Computation in Large Language Models | |
Optimizing Language Models for Inference Time Objectives using Reinforcement Learning | |
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild | |
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders | |
Reasoning to Learn from Latent Thoughts | |
Defeating Prompt Injections by Design | |
Verbal Process Supervision Elicits Better Coding Agents | |
Context-Efficient Retrieval with Factual Decomposition | |
AgentRxiv: Towards Collaborative Autonomous Research | |
Mind with Eyes: from Language Reasoning to Multimodal Reasoning | |
Lost in Cultural Translation: Do LLMs Struggle with Math Across Cultural Contexts? | |
Vision-R1: Evolving Human-Free Alignment in Large Vision-Language Models via Vision-Guided Reinforcement Learning | |
SUNAR: Semantic Uncertainty based Neighborhood Aware Retrieval for Complex QA | |
Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models | |
Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM | |
Bayesian Teaching Enables Probabilistic Reasoning in Large Language Models | |
Language Models May Verbatim Complete TextThey Were Not Explicitly Trained On | |
Variance Control via Weight Rescaling in LLM Pre-training | |
Judge Anything: MLLM as a Judge Across Any Modality | |
LEMMA: Learning from Errors for MatheMatical Advancement in LLMs | |
V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms | |
Typed-RAG: Type-aware Multi-Aspect Decomposition for Non-Factoid Question Answering | |
Does Context Matter? ContextualJudgeBench for Evaluating LLM-based Judges in Contextual Settings | |
CoLLM: A Large Language Model for Composed Image Retrieval | |
Mobile-MMLU: A Mobile Intelligence Language Understanding Benchmark | |
MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search | |
ADS-Edit: A Multimodal Knowledge Editing Dataset for Autonomous Driving Systems | |
Dewey Long Context Embedding Model: A Technical Report | |
Fully Autonomous AI Agents Should Not be Developed | |
Open Deep Search: Democratizing Search with Open-source Reasoning Agents | |
Efficient Model Development through Fine-tuning Transfer | |
LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning? | |
LogQuant: Log-Distributed 2-Bit Quantization of KV Cache with Superior Accuracy Preservation | |
LookAhead Tuning: Safer Language Models via Partial Answer Previews | |
Scaling Vision Pre-Training to 4K Resolution | |
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators | |
Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking | |
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation | |
Learning to chain-of-thought with Jensen's evidence lower bound | |
Scaling Laws of Synthetic Data for Language Models | |
ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning | |
Can Vision-Language Models Answer Face to Face Questions in the Real-World? | |
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models | |
Overcoming Vocabulary Mismatch: Vocabulary-agnostic Teacher Guided Language Modeling | |
xKV: Cross-Layer SVD for KV-Cache Compression | |
LLaVAction: evaluating and training multi-modal large language models for action recognition | |
When Words Outperform Vision: VLMs Can Self-Improve Via Text-Only Training For Human-Centered Decision Making | |
Sparse Logit Sampling: Accelerating Knowledge Distillation in LLMs | |
MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding | |
FirePlace: Geometric Refinements of LLM Common Sense Reasoning for 3D Object Placement | |
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers | |
Unlocking Efficient Long-to-Short LLM Reasoning with Model Merging | |
Video-R1: Reinforcing Video Reasoning in MLLMs | |
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation | |
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement Learning | |
Large Language Model Agent: A Survey on Methodology, Applications and Challenges | |
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models | |
ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition | |
Identifying Emerging Concepts in Large Corpora | |
ZJUKLAB at SemEval-2025 Task 4: Unlearning via Model Merging | |
FinAudio: A Benchmark for Audio Large Language Models in Financial Applications | |
LLPut: Investigating Large Language Models for Bug Report-Based Input Generation | |
ViLBench: A Suite for Vision-Language Process Reward Modeling | |
Qwen2.5-Omni Technical Report | |
Gemma 3 Technical Report | |
Overtrained Language Models Are Harder to Fine-Tune | |
Don't lie to your friends: Learning what you know from collaborative self-play | |
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs | |
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey | |
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation | |
RONA: Pragmatically Diverse Image Captioning with Coherence Relations | |
MARRO: Multi-headed Attention for Rhetorical Role Labeling in Legal Documents | |
New Trends for Modern Machine Translation with Large Reasoning Models | |
Compute Optimal Scaling of Skills: Knowledge vs Reasoning | |
MoC: Mixtures of Text Chunking Learners for Retrieval-Augmented Generation System | |
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks | |
Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs | |
LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference | |
Training Plug-n-Play Knowledge Modules with Deep Context Distillation | |
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful | |
HuixiangDou2: A Robustly Optimized GraphRAG Approach | |
A Survey on Post-training of Large Language Models | |
A Survey of Large Language Model Empowered Agents for Recommendation and Search: Towards Next-Generation Information Retrieval | |
Knowledge Updating? No More Model Editing! Just Selective Contextual Reasoning | |
Continual Pre-training of MoEs: How robust is your router? | |
SafeArena: Evaluating the Safety of Autonomous Web Agents | |
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models | |
Taxonomy, Opportunities, and Challenges of Representation Engineering for Large Language Models | |
Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs | |
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning | |
PDX: A Data Layout for Vector Similarity Search | |
Process-based Self-Rewarding Language Models | |
SoftMatcha: A Soft and Fast Pattern Matcher for Billion-Scale Corpus Searches | |
LLM as GNN: Graph Vocabulary Learning for Text-Attributed Graph Foundation Models | |
FANS -- Formal Answer Selection for Natural Language Math Reasoning Using Lean4 | |
Improving LLM-as-a-Judge Inference with the Judgment Distribution | |
Tabby: Tabular Data Synthesis with Language Models | |
How to Steer LLM Latents for Hallucination Detection? | |
Better Embeddings with Coupled Adam | |
RSQ: Learning from Important Tokens Leads to Better Quantized LLMs | |
CoSMoEs: Compact Sparse Mixture of Experts | |
Steering Large Language Model Activations in Sparse Spaces | |
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning | |
LLM Post-Training: A Deep Dive into Reasoning Large Language Models | |
Token-level Ensembling of Models with Different Vocabularies | |
WebFAQ: A Multilingual Collection of Natural Q&A Datasets for Dense Retrieval | |
FlexPrefill: A Context-Aware Sparse Attention Mechanism for Efficient Long-Sequence Inference | |
$Q\sharp$: Provably Optimal Distributional RL for LLM Post-Training | |
Distill Not Only Data but Also Rewards: Can Smaller Language Models Surpass Larger Ones? | |
Long-Context Inference with Retrieval-Augmented Speculative Decoding | |
Telephone Surveys Meet Conversational AI: Evaluating a LLM-Based Telephone Survey System at Scale | |
Implicit Search via Discrete Diffusion: A Study on Chess | |
Speculative Decoding and Beyond: An In-Depth Survey of Techniques | |
GRACE: A Granular Benchmark for Evaluating Model Calibration against Human Calibration | |
No, of course I can! Refusal Mechanisms Can Be Exploited Using Harmless Fine-Tuning Data | |
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs | |
Learning Code-Edit Embedding to Model Student Debugging Behavior | |
UQABench: Evaluating User Embedding for Prompting LLMs in Personalized Question Answering | |
Talking like Piping and Instrumentation Diagrams (P&IDs) | |
Letters from Future Self: Augmenting the Letter-Exchange Exercise with LLM-based Agents to Enhance Young Adults' Career Exploration | |
Seeing the Forest for the Trees: A Large Scale, Continuously Updating Meta-Analysis of Frontier LLMs | |
Automatic Prompt Optimization via Heuristic Search: A Survey | |
TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning | |
Unveiling and Causalizing CoT: A Causal Pespective | |
Bayesian Optimization for Controlled Image Editing via LLMs | |
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning | |
MM-PoisonRAG: Disrupting Multimodal RAG with Local and Global Poisoning Attacks | |
An Overview of Large Language Models for Statisticians | |
From System 1 to System 2: A Survey of Reasoning Large Language Models | |
LLM-QE: Improving Query Expansion by Aligning Large Language Models with Ranking Preferences | |
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines | |
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries | |
Wrong Answers Can Also Be Useful: PlausibleQA -- A Large-Scale QA Dataset with Answer Plausibility Scores | |
Interrogating LLM design under a fair learning doctrine | |
The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination | |
Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models | |
LLMs in Mobile Apps: Practices, Challenges, and Opportunities | |
An Agent Framework for Real-Time Financial Information Searching with Large Language Models | |
Privacy Ripple Effects from Adding or Removing Personal Information in Language Model Training | |
Generalizing From Short to Long: Effective Data Synthesis for Long-Context Instruction Tuning | |
DReSD: Dense Retrieval for Speculative Decoding | |
SVDq: 1.25-bit and 410x Key Cache Compression for LLM Attention | |
LEDD: Large Language Model-Empowered Data Discovery in Data Lakes | |
Which Attention Heads Matter for In-Context Learning? | |
MuDAF: Long-Context Multi-Document Attention Focusing through Contrastive Learning on Attention Heads | |
How Do LLMs Perform Two-Hop Reasoning in Context? | |
Lost in Sequence: Do Large Language Models Understand Sequential Recommendation? | |
Evaluating Step-by-step Reasoning Traces: A Survey | |
Idiosyncrasies in Large Language Models | |
Fast or Better? Balancing Accuracy and Cost in Retrieval-Augmented Generation with Flexible User Control | |
TokenSkip: Controllable Chain-of-Thought Compression in LLMs | |
CONSTRUCTA: Automating Commercial Construction Schedules in Fabrication Facilities with Large Language Models | |
LLM Agents Making Agent Tools | |
SMART: Self-Aware Agent for Tool Overuse Mitigation | |
Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents | |
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention | |
RAS: Retrieval-And-Structuring for Knowledge-Intensive LLM Generation | |
Do We Need to Verify Step by Step? Rethinking Process Supervision from a Theoretical Perspective | |
Solvable Dynamics of Self-Supervised Word Embeddings and the Emergence of Analogical Reasoning | |
Spectral Journey: How Transformers Predict the Shortest Path | |
IHEval: Evaluating Language Models on Following the Instruction Hierarchy | |
Human Decision-making is Susceptible to AI-driven Manipulation | |
A Comprehensive Review of Protein Language Models | |
Transforming Science with Large Language Models: A Survey on AI-assisted Scientific Discovery, Experimentation, Content Generation, and Evaluation | |
Developmentally-plausible Working Memory Shapes a Critical Period for Language Acquisition | |
SiriuS: Self-improving Multi-agent Systems via Bootstrapped Reasoning | |
CARROT: A Cost Aware Rate Optimal Router | |
Peri-LN: Revisiting Layer Normalization in the Transformer Architecture | |
Layer by Layer: Uncovering Hidden Representations in Language Models | |
VideoRAG: Retrieval-Augmented Generation with Extreme Long-Context Videos | |
Explaining Context Length Scaling and Bounds for Language Models | |
Eliciting Language Model Behaviors with Investigator Agents | |
Internal Activation as the Polar Star for Steering Unsafe LLM Behavior | |
MINT: Mitigating Hallucinations in Large Vision-Language Models via Token Reduction | |
Sigmoid Self-Attention is Better than Softmax Self-Attention: A Mixture-of-Experts Perspective | |
Do Large Multimodal Models Solve Caption Generation for Scientific Figures? Lessons Learned from SciCap Challenge 2023 | |
Rope to Nope and Back Again: A New Hybrid Attention Strategy | |
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs | |
Improving LLM Leaderboards with Psychometrical Methodology | |
Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key | |
ADAM-1: AI and Bioinformatics for Alzheimer's Detection and Microbiome-Clinical Data Integrations | |
Engineering LLM Powered Multi-agent Framework for Autonomous CloudOps | |
I Can Find You in Seconds! Leveraging Large Language Models for Code Authorship Attribution | |
Large Language Model Interface for Home Energy Management Systems | |
Enhancing Talent Employment Insights Through Feature Extraction with LLM Finetuning | |
Hierarchical Repository-Level Code Summarization for Business Applications Using Local LLMs | |
Flow: Modularized Agentic Workflow Automation | |
3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding | |
CodeCoR: An LLM-Based Self-Reflective Multi-Agent Framework for Code Generation | |
Visual Language Models as Operator Agents in the Space Domain | |
Large Language Models for Interpretable Mental Health Diagnosis | |
SafePowerGraph-LLM: Novel Power Grid Graph Embedding and Optimization with Large Language Models | |
Evaluating Agent-based Program Repair at Google | |
RadAlign: Advancing Radiology Report Generation with Vision-Language Concept Alignment | |
AIOpsLab: A Holistic Framework to Evaluate AI Agents for Enabling Autonomous Clouds | |
LLM-Net: Democratizing LLMs-as-a-Service through Blockchain-based Expert Networks | |
Touched by ChatGPT: Using an LLM to Drive Affective Tactile Interaction | |
How is Google using AI for internal code migrations? | |
Eliza: A Web3 friendly AI Agent Operating System | |
Using Pre-trained LLMs for Multivariate Time Series Forecasting | |
IntelEX: A LLM-driven Attack-level Threat Intelligence Extraction Framework | |
ConSim: Measuring Concept-Based Explanations' Effectiveness with Automated Simulatability | |
Debugging Without Error Messages: How LLM Prompting Strategy Affects Programming Error Explanation Effectiveness | |
MDSF: Context-Aware Multi-Dimensional Data Storytelling Framework based on Large language Model | |
LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management | |
Efficient Reasoning Models: A Survey | |
Embracing Large Language Models in Traffic Flow Forecasting | |
TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs | |
Semantic Steganography: A Framework for Robust and High-Capacity Information Hiding using Large Language Models | |
A Contextualized BERT model for Knowledge Graph Completion | |
WHAT-IF: Exploring Branching Narratives by Meta-Prompting Large Language Models | |
GeLoRA: Geometric Adaptive Ranks For Efficient LoRA Fine-tuning | |
ZigZagkv: Dynamic KV Cache Compression for Long-context Modeling based on Layer Uncertainty | |
Lexico: Extreme KV Cache Compression via Sparse Coding over Universal Dictionaries | |
EMS: Adaptive Evict-then-Merge Strategy for Head-wise KV Cache Compression Based on Global-Local Importance | |
SwarmGPT-Primitive: A Language-Driven Choreographer for Drone Swarms Using Safe Motion Primitive Composition | |
CMT: A Memory Compression Method for Continual Knowledge Learning of Large Language Models | |
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models | |
Asynchronous LLM Function Calling | |
AutoReason: Automatic Few-Shot Reasoning Decomposition | |
Taming Sensitive Weights : Noise Perturbation Fine-tuning for Robust LLM Quantization | |
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families | |
Antidistillation Sampling | |
LLM-BIP: Structured Pruning for Large Language Models with Block-Wise Forward Importance Propagation | |
Beyond pip install: Evaluating LLM Agents for the Automated Installation of Python Projects | |
iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models | |
SiReRAG: Indexing Similar and Related Information for Multihop Reasoning | |
Enhanced Computationally Efficient Long LoRA Inspired Perceiver Architectures for Auto-Regressive Language Modeling | |
Does RLHF Scale? Exploring the Impacts From Data, Model, and Method | |
LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods | |
HiVeGen -- Hierarchical LLM-based Verilog Generation for Scalable Chip Design | |
Rethinking Time Series Forecasting with LLMs via Nearest Neighbor Contrastive Learning | |
Cross-Self KV Cache Pruning for Efficient Vision-Language Inference | |
MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification | |
DataLab: A Unified Platform for LLM-Powered Business Intelligence | |
Towards Adaptive Mechanism Activation in Language Agent | |
InstCache: A Predictive Cache for LLM Serving | |
DFRot: Achieving Outlier-Free and Massive Activation-Free for Rotated LLMs with Refined Rotation | |
ARChef: An iOS-Based Augmented Reality Cooking Assistant Powered by Multimodal Gemini LLM | |
Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects | |
AIDetx: a compression-based method for identification of machine-learning generated text | |
Build An Influential Bot In Social Media Simulations With Large Language Models | |
Context-Aware Membership Inference Attacks against Pre-trained Large Language Models | |
Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning | |
Marconi: Prefix Caching for the Era of Hybrid LLMs | |
CoVis: A Collaborative Framework for Fine-grained Graphic Visual Understanding | |
FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving | |
MiniKV: Pushing the Limits of LLM Inference via 2-Bit Layer-Discriminative KV Cache | |
FlexiBit: Fully Flexible Precision Bit-parallel Accelerator Architecture for Arbitrary Mixed Precision AI | |
Automated Test Transfer Across Android Apps Using Large Language Models | |
Enhancing Character-Level Understanding in LLMs through Token Internal Structure Learning | |
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models | |
MixPE: Quantization and Hardware Co-design for Efficient LLM Inference | |
Reassessing Layer Pruning in LLMs: New Insights and Methods | |
Measuring Bullshit in the Language Games played by ChatGPT | |
AttriBoT: A Bag of Tricks for Efficiently Approximating Leave-One-Out Context Attribution | |
Multiverse of Greatness: Generating Story Branches with LLMs | |
DRPruning: Efficient Large Language Model Pruning through Distributionally Robust Optimization | |
How Good is ChatGPT at Audiovisual Deepfake Detection: A Comparative Study of ChatGPT, AI Models and Human Perception | |
One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity | |
Accelerated AI Inference via Dynamic Execution Methods | |
Protecting Privacy in Multimodal Large Language Models with MLLMU-Bench | |
Centaur: a foundation model of human cognition | |
Markov Chain of Thought for Efficient Mathematical Reasoning | |
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling | |
Exact Byte-Level Probabilities from Tokenized Language Models for FIM-Tasks and Model Ensembles | |
Compositional Entailment Learning for Hyperbolic Vision-Language Models | |
Uplifting Lower-Income Data: Strategies for Socioeconomic Perspective Shifts in Large Multi-modal Models | |
ImProver: Agent-Based Automated Proof Optimization | |
ReGenesis: LLMs can Grow into Reasoning Generalists via Self-Improvement | |
SELP: Generating Safe and Efficient Task Plans for Robot Agents with Large Language Models | |
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models | |
AdaPPA: Adaptive Position Pre-Fill Jailbreak Attack Approach Targeting LLMs | |
What Would You Ask When You First Saw $a^2+b^2=c^2$? Evaluating LLM on Curiosity-Driven Questioning | |
Strategies for Improving NL-to-FOL Translation with LLMs: Data Generation, Incremental Fine-Tuning, and Verification | |
Advertiser Content Understanding via LLMs for Google Ads Safety | |
Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits | |
Logically Consistent Language Models via Neuro-Symbolic Integration | |
Evaluating Defences against Unsafe Feedback in RLHF | |
SpecEval: Evaluating Code Comprehension in Large Language Models via Program Specifications | |
Human Interest or Conflict? Leveraging LLMs for Automated Framing Analysis in TV Shows | |
Profiling Patient Transcript Using Large Language Model Reasoning Augmentation for Alzheimer's Disease Detection | |
On the consistent reasoning paradox of intelligence and optimal trust in AI: The power of 'I don't know' | |
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models | |
Is your benchmark truly adversarial? AdvScore: Evaluating Human-Grounded Adversarialness | |
GenoTEX: A Benchmark for Automated Gene Expression Data Analysis in Alignment with Bioinformaticians | |
Why Would You Suggest That? Human Trust in Language Model Responses | |
SLMRec: Distilling Large Language Models into Small for Sequential Recommendation | |
LLMs can Find Mathematical Reasoning Mistakes by Pedagogical Chain-of-Thought | |
Does Mapo Tofu Contain Coffee? Probing LLMs for Food-related Cultural Knowledge | |
How Far Are We on the Decision-Making of LLMs? Evaluating LLMs' Gaming Ability in Multi-Agent Environments | |
Prompt Optimization via Adversarial In-Context Learning | |
PHYSICS: Benchmarking Foundation Models on University-Level Physics Problem Solving | |
Generative Sequential Recommendation with GPTRec | |
Does Vision Accelerate Hierarchical Generalization in Neural Language Learners? | |
Hierarchical LLMs In-the-loop Optimization for Real-time Multi-Robot Target Tracking under Unknown Hazards | |
ChefFusion: Multimodal Foundation Model Integrating Recipe and Food Image Generation | |
MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts | |
LLM-Powered Text Simulation Attack Against ID-Free Recommender Systems | |
Art and Science of Quantizing Large-Scale Models: A Comprehensive Overview | |
LlamaF: An Efficient Llama2 Architecture Accelerator on Embedded FPGAs | |
Investigating Context-Faithfulness in Large Language Models: The Roles of Memory Strength and Evidence Style | |
Self-Attention Limits Working Memory Capacity of Transformer-Based Models | |
E2Map: Experience-and-Emotion Map for Self-Reflective Robot Navigation with Language Models | |
Confidence Estimation for LLM-Based Dialogue State Tracking | |
Inf-MLLM: Efficient Streaming Inference of Multimodal Large Language Models on a Single GPU | |
FP-VEC: Fingerprinting Large Language Models via Efficient Vector Addition | |
LLaQo: Towards a Query-Based Coach in Expressive Music Performance Assessment | |
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions | |
Intelligent LiDAR Navigation: Leveraging External Information and Semantic Maps with LLM as Copilot | |
Faster Speech-LLaMA Inference with Multi-token Prediction | |
Enabling Cost-Effective UI Automation Testing with Retrieval-Based LLMs: A Case Study in WeChat | |
Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches | |
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks | |
Geometric-Averaged Preference Optimization for Soft Preference Labels | |
Optimal Workload Placement on Multi-Instance GPUs | |
Towards Agentic AI on Particle Accelerators | |
Enhancing Long Video Understanding via Hierarchical Event-Based Memory | |
Algorithmic Language Models with Neurally Compiled Libraries | |
Harder Tasks Need More Experts: Dynamic Routing in MoE Models | |
Shared Global and Local Geometry of Language Model Embeddings | |
Multi-head Reward Aggregation Guided by Entropy | |
How do language models learn facts? Dynamics, curricula and hallucinations | |
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond | |
debug-gym: A Text-Based Environment for Interactive Debugging | |
SWI: Speaking with Intent in Large Language Models | |
From Deep Learning to LLMs: A survey of AI in Quantitative Investment | |
RALLRec+: Retrieval Augmented Large Language Model Recommendation with Reasoning | |
Speculative Decoding for Verilog: Speed and Quality, All in One | |
Bias-Augmented Consistency Training Reduces Biased Reasoning in Chain-of-Thought | |
Writing as a testbed for open ended agents | |
A Survey of Large Language Model Agents for Question Answering | |
Self-Organizing Graph Reasoning Evolves into a Critical State for Continuous Discovery Through Structural-Semantic Dynamics | |
Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models | |
Is a Good Foundation Necessary for Efficient Reinforcement Learning? The Computational Role of the Base Model in Exploration | |
Improving Low-Resource Retrieval Effectiveness using Zero-Shot Linguistic Similarity Transfer | |
A Refined Analysis of Massive Activations in LLMs | |
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback | |
MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow | |
On Large Multimodal Models as Open-World Image Classifiers | |
ReFeed: Multi-dimensional Summarization Refinement with Reflective Reasoning on Feedback | |
AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation | |
4D-Bench: Benchmarking Multi-modal Large Language Models for 4D Object Understanding | |
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning | |
Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models | |
Challenges and Paths Towards AI for Software Engineering | |
Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities | |
Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad | |
Evaluating LLM-based Agents for Multi-Turn Conversations: A Survey | |
Supposedly Equivalent Facts That Aren't? Entity Frequency in Pre-training Induces Asymmetry in LLMs | |
Sharpe Ratio-Guided Active Learning for Preference Optimization in RLHF | |
Is Best-of-N the Best of Them? Coverage, Scaling, and Optimality in Inference-Time Alignment | |
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features | |
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for Large Language Models | |
Effectively Controlling Reasoning Models through Thinking Intervention | |
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model | |
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models | |
TeleAntiFraud-28k: A Audio-Text Slow-Thinking Dataset for Telecom Fraud Detection | |
Expanding RL with Verifiable Rewards Across Diverse Domains | |
KOFFVQA: An Objectively Evaluated Free-form VQA Benchmark for Large Vision-Language Models in the Korean Language | |
Efficient Inference for Large Reasoning Models: A Survey | |
Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python Code | |
LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding | |
Entropy-Based Adaptive Weighting for Self-Training | |
Decoupling Angles and Strength in Low-rank Adaptation | |
UPME: An Unsupervised Peer Review Framework for Multimodal Large Language Model Evaluation | |
Rec-R1: Bridging Generative Large Language Models and User-Centric Recommendation Systems via Reinforcement Learning | |
Better wit than wealth: Dynamic Parametric Retrieval Augmented Generation for Test-time Knowledge Enhancement | |
Adaptive Layer-skipping in Pre-trained LLMs | |
RARE: Retrieval-Augmented Reasoning Modeling | |
TRA: Better Length Generalisation with Threshold Relative Attention | |
ActionStudio: A Lightweight Framework for Data and Training of Large Action Models | |
PAVE: Patching and Adapting Video Large Language Models | |
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning | |
Discovering Knowledge Deficiencies of Language Models on Massive Knowledge Base | |
Multi-Token Attention | |
Recitation over Reasoning: How Cutting-Edge Language Models Can Fail on Elementary School-Level Reasoning Problems? | |
m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models | |
Z1: Efficient Test-time Scaling with Code | |
Command A: An Enterprise-Ready Large Language Model | |
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources | |
TimeLMs: Diachronic Language Models from Twitter | |
Inference-Time Scaling for Complex Tasks: Where We Stand and What Lies Ahead | |
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs | |
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation | |
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 | |
AdaMMS: Model Merging for Heterogeneous Multimodal Large Language Models with Unsupervised Coefficient Optimization | |
CodeARC: Benchmarking Reasoning Capabilities of LLM Agents for Inductive Program Synthesis | |
JudgeLRM: Large Reasoning Models as a Judge | |
Towards Trustworthy GUI Agents: A Survey | |
OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts | |
Mind the Gap: Understanding the Modality Gap in Multi-modal Contrastive Representation Learning | |
LARGE: Legal Retrieval Augmented Generation Evaluation Tool | |
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL | |
Token embeddings violate the manifold hypothesis | |
Hawkeye:Efficient Reasoning with Model Collaboration | |
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models | |
Improved Visual-Spatial Reasoning via R1-Zero-Like Training | |
PaperBench: Evaluating AI's Ability to Replicate AI Research | |
YourBench: Easy Custom Evaluation Sets for Everyone | |
Safeguarding Vision-Language Models: Mitigating Vulnerabilities to Gaussian Noise in Perturbation-based Attacks | |
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction | |
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems | |
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations | |
Inference-Time Scaling for Generalist Reward Modeling | |
Efficient Model Selection for Time Series Forecasting via LLMs | |
Critical Thinking: Which Kinds of Complexity Govern Optimal Reasoning Length? | |
ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement | |
Bhakti: A Lightweight Vector Database Management System for Endowing Large Language Models with Semantic Search Capabilities and Memory | |
MLKV: Efficiently Scaling up Large Embedding Model Training with Disk-based Key-Value Storage | |
ThinkPrune: Pruning Long Chain-of-Thought of LLMs via Reinforcement Learning | |
Medical large language models are easily distracted | |
VerifiAgent: a Unified Verification Agent in Language Model Reasoning | |
DASH: Detection and Assessment of Systematic Hallucinations of VLMs | |
Understanding R1-Zero-Like Training: A Critical Perspective | |
Affordable AI Assistants with Knowledge Graph of Thoughts | |
ZClip: Adaptive Spike Mitigation for LLM Pre-Training | |
Scaling Analysis of Interleaved Speech-Text Language Models | |
MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism | |
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning | |
ShortV: Efficient Multimodal Large Language Models by Freezing Visual Tokens in Ineffective Layers | |
Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models | |
OpenCodeReasoning: Advancing Data Distillation for Competitive Coding | |
Comment Staytime Prediction with LLM-enhanced Comment Understanding | |
Scaling Laws in Scientific Discovery with AI and Robot Scientists | |
Analyzing the Generalization and Reliability of Steering Vectors | |
Why do LLMs attend to the first token? | |
Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models | |
UNDO: Understanding Distillation as Optimization | |
How Do Large Language Monkeys Get Their Power (Laws)? | |
Multi-Agent Multimodal Models for Multicultural Text to Image Generation | |
Large Language Models Pass the Turing Test | |
Mixture of Routers | |
MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models | |
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge Refinement | |
Agentic Knowledgeable Self-awareness | |
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning | |
Soft Policy Optimization: Online Off-Policy RL for Sequence Models | |
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving | |
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay | |
BEATS: Bias Evaluation and Assessment Test Suite for Large Language Models | |
MegaMath: Pushing the Limits of Open Math Corpora | |
Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence | |
Slow-Fast Architecture for Video Multi-Modal Large Language Models | |
TransMamba: Flexibly Switching between Transformer and Mamba | |
Do Larger Language Models Imply Better Reasoning? A Pretraining Scaling Law for Reasoning | |
ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning | |
Knowledge-Instruct: Effective Continual Pre-training from Limited Data using Instructions | |
URECA: Unique Region Caption Anything | |
Nemotron-H: A Family of Accurate and Efficient Hybrid Mamba-Transformer Models | |
Align to Structure: Aligning Large Language Models with Structural Information | |
DeepResearcher: Scaling Deep Research via Reinforcement Learning in Real-world Environments | |
Understanding Aha Moments: from External Observations to Internal Mechanisms | |
Attention Sinks and Outlier Features: A 'Catch, Tag, and Release' Mechanism for Embeddings | |
LiveVQA: Live Visual Knowledge Seeking | |
M-Prometheus: A Suite of Open Multilingual LLM Judges | |
A Llama walks into the 'Bar': Efficient Supervised Fine-Tuning for Legal Reasoning in the Multi-state Bar Exam | |
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models | |
Are You Getting What You Pay For? Auditing Model Substitution in LLM APIs | |
Clinical ModernBERT: An efficient and long context encoder for biomedical text | |
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model | |
RaanA: A Fast, Flexible, and Data-Efficient Post-Training Quantization Algorithm | |
Robustly identifying concepts introduced during chat fine-tuning using crosscoders | |
SmolVLM: Redefining small and efficient multimodal models | |
DoCIA: An Online Document-Level Context Incorporation Agent for Speech Translation | |
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks | |
T1: Tool-integrated Self-verification for Test-time Compute Scaling in Small Language Models | |
GlotEval: A Test Suite for Massively Multilingual Evaluation of Large Language Models | |
Rethinking Multilingual Continual Pretraining: Data Mixing for Adapting LLMs Across Languages and Resources | |
DiaTool-DPO: Multi-Turn Direct Preference Optimization for Tool-Augmented Large Language Models | |
Mixture-of-Personas Language Models for Population Simulation | |
Constitution or Collapse? Exploring Constitutional AI with Llama 3-8B | |
Rethinking Reflection in Pre-Training | |
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models | |
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1) | |
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use | |
Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models | |
Kimi-VL Technical Report | |
Saliency-driven Dynamic Token Pruning for Large Language Models | |
Retro-Search: Exploring Untaken Paths for Deeper and Efficient Reasoning | |
Gating is Weighting: Understanding Gated Linear Attention through In-context Learning | |
Distillation and Refinement of Reasoning in Small Language Models for Document Re-ranking | |
EnrichIndex: Using LLMs to Enrich Retrieval Indices Offline | |
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention | |
An Empirical Study of GPT-4o Image Generation Capabilities | |
Generative Evaluation of Complex Reasoning in Large Language Models | |
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation | |
Retrieval Augmented Generation with Collaborative Filtering for Personalized Text Generation | |
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought | |
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models | |
COIG-P: A High-Quality and Large-Scale Chinese Preference Dataset for Alignment with Human Values | |
Leanabell-Prover: Posttraining Scaling in Formal Reasoning | |
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference | |
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning | |
From 128K to 4M: Efficient Training of Ultra-Long Context Large Language Models | |
Lattice: Learning to Efficiently Compress the Memory | |
Is the Reversal Curse a Binding Problem? Uncovering Limitations of Transformers from a Basic Generalization Failure | |
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens | |
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills | |
OmniCaptioner: One Captioner to Rule Them All | |
A Sober Look at Progress in Language Model Reasoning: Pitfalls and Paths to Reproducibility | |
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill? | |
Exact Unlearning of Finetuning Data via Model Merging at Scale | |
Self-Steering Language Models | |
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning | |
RuOpinionNE-2024: Extraction of Opinion Tuples from Russian News Texts | |
An Investigation of Prompt Variations for Zero-shot LLM-based Rankers | |
Rethinking RoPE: A Mathematical Blueprint for N-dimensional Positional Encoding | |
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting | |
Pretraining Language Models for Diachronic Linguistic Change Discovery | |
A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models | |
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling | |
CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers | |
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing | |
MM-IFEngine: Towards Multimodal Instruction Following | |
MemInsight: Autonomous Memory Augmentation for LLM Agents | |
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning | |
MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations | |
A System for Comprehensive Assessment of RAG Frameworks | |
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning | |
OSCAR: Online Soft Compression And Reranking | |
Relevance Isn't All You Need: Scaling RAG Systems With Inference-Time Compute Via Multi-Criteria Reranking | |
Towards Visual Text Grounding of Multimodal Large Language Model | |
To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning | |
Self-Evolving Multi-Agent Simulations for Realistic Clinical Interactions | |
Current and Future Use of Large Language Models for Knowledge Work | |
Synthetic Data Generation Using Large Language Models: Advances in Text and Code | |
Dynamic Cheatsheet: Test-Time Learning with Adaptive Memory | |
SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning | |
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model | |
Task-Circuit Quantization: Leveraging Knowledge Localization and Interpretability for Compression | |
Alice: Proactive Learning with Teacher's Demonstrations for Weak-to-Strong Generalization | |
A Stronger Mixture of Low-Rank Experts for Fine-Tuning Foundation Models | |
How do Large Language Models Understand Relevance? A Mechanistic Interpretability Perspective | |
LLM4Ranking: An Easy-to-use Framework of Utilizing Large Language Models for Document Reranking | |
Towards Distribution Matching between Collaborative and Language Spaces for Generative Recommendation | |
On the Consistency of Multilingual Context Utilization in Retrieval-Augmented Generation | |
Echo Chamber: RL Post-training Amplifies Behaviors Learned in Pretraining | |
GOLLuM: Gaussian Process Optimized LLMs -- Reframing LLM Finetuning through Bayesian Optimization | |
Increasing happiness through conversations with artificial intelligence | |
End-To-End Memory Networks | |
Revisiting Prompt Optimization with Large Reasoning Models-A Case Study on Event Extraction | |
Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization | |
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance | |
SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning | |
Out of Style: RAG's Fragility to Linguistic Variation | |
SAEs $\textit{Can}$ Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs | |
CoRAG: Collaborative Retrieval-Augmented Generation | |
Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs | |
Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models | |
Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images | |
SWAN-GPT: An Efficient and Scalable Approach for Long-Context Language Modeling | |
On The Landscape of Spoken Language Models: A Comprehensive Survey | |
SEAL: Steerable Reasoning Calibration of Large Language Models for Free | |
Perception-R1: Pioneering Perception Policy with Reinforcement Learning | |
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models | |
LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models | |
S1-Bench: A Simple Benchmark for Evaluating System 1 Thinking Capability of Large Reasoning Models | |
SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users | |
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model | |
KeepKV: Eliminating Output Perturbation in KV Cache Compression for Efficient LLMs Inference | |
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding | |
Executable Functional Abstractions: Inferring Generative Programs for Advanced Math Problems | |
DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training | |
EmoAgent: Assessing and Safeguarding Human-AI Interaction for Mental Health Safety | |
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning | |
Do Reasoning Models Show Better Verbalized Calibration? | |
AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories | |
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning | |
PRIMA.CPP: Speeding Up 70B-Scale LLM Inference on Low-Resource Everyday Home Clusters | |
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability | |
Evaluating the Generalization Capabilities of Large Language Models on Code Reasoning | |
Amuse: Human-AI Collaborative Songwriting with Multimodal Inspirations | |
Language Models can Evaluate Themselves via Probability Discrepancy | |
M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models | |
LLM Can be a Dangerous Persuader: Empirical Study of Persuasion Safety in Large Language Models | |
A Survey of Personalization: From RAG to Agent | |
Reasoning Models Can Be Effective Without Thinking | |
RAKG:Document-level Retrieval Augmented Knowledge Graph Construction | |
VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents | |
Iterative Self-Training for Code Generation via Reinforced Re-Ranking | |
(How) Do reasoning models reason? | |
How new data permeates LLM knowledge and how to dilute it | |
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search | |
UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation | |
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization? | |
Teaching Large Language Models to Reason through Learning and Forgetting | |
MDK12-Bench: A Multi-Discipline Benchmark for Evaluating Reasoning in Multimodal Large Language Models | |
MIEB: Massive Image Embedding Benchmark | |
Beyond Memorization: Mapping the Originality-Quality Frontier of Language Models | |
Long Context In-Context Compression by Getting to the Gist of Gisting | |
From Tokens to Lattices: Emergent Lattice Structures in Language Models | |
MCP Safety Audit: LLMs with the Model Context Protocol Allow Major Security Exploits | |
Language Model Alignment in Multilingual Trolley Problems | |
Robust and Fine-Grained Detection of AI Generated Texts | |
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float | |
DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning | |
TextArena | |
Reinforcing Compositional Retrieval: Retrieving Step-by-Step for Composing Informative Contexts | |
Efficient Hybrid Language Model Compression through Group-Aware SSM Pruning | |
DataDecide: How to Predict Best Pretraining Data with Small Experiments | |
A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce | |
Looking beyond the next token | |
LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews | |
ReZero: Enhancing LLM search ability by trying one-more-time | |
Understanding LLMs' Cross-Lingual Context Retrieval: How Good It Is And Where It Comes From | |
How Instruction and Reasoning Data shape Post-Training: Data Quality through the Lens of Layer-wise Gradients | |
Better Estimation of the KL Divergence Between Language Models | |
Efficient Process Reward Model Training via Active Learning | |
HeteRAG: A Heterogeneous Retrieval-augmented Generation Framework with Decoupled Knowledge Representations | |
xVerify: Efficient Answer Verifier for Reasoning Model Evaluations | |
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding | |
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer | |
Multimodal Long Video Modeling Based on Temporal Dynamic Context | |
VisualPuzzles: Decoupling Multimodal Reasoning Evaluation from Domain Knowledge | |
Heimdall: test-time scaling on the generative verification | |
RealHarm: A Collection of Real-World Language Model Application Failures | |
AI-University: An LLM-based platform for instructional alignment to scientific classrooms | |
Genius: A Generalizable and Purely Unsupervised Self-Training Framework For Advanced Reasoning | |
Word Embeddings Track Social Group Changes Across 70 Years in China | |
Adaptive Computation Pruning for the Forgetting Transformer | |
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs | |
Liquid: Language Models are Scalable and Unified Multi-modal Generators | |
BitNet b1.58 2B4T Technical Report | |
d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning | |
A Library of LLM Intrinsics for Retrieval-Augmented Generation | |
NodeRAG: Structuring Graph-based RAG with Heterogeneous Nodes | |
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs | |
AlayaDB: The Data Foundation for Efficient and Effective Long-context LLM Inference | |
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning | |
Syzygy of Thoughts: Improving LLM CoT with the Minimal Free Resolution | |
SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models | |
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT? | |
FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents | |
A Survey of Multimodal Retrieval-Augmented Generation | |
MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges? | |
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training | |
Exploring Expert Failures Improves LLM Agent Tuning | |
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models | |
Retrieval-Augmented Generation with Conflicting Evidence | |
Estimating Optimal Context Length for Hybrid Retrieval-augmented Multi-document Summarization | |
HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation | |
FocusedAD: Character-centric Movie Audio Description | |
Improving Instruct Models for Free: A Study on Partial Adaptation | |
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives | |
Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian | |
A Strategic Coordination Framework of Small LLMs Matches Large LLMs in Data Synthesis | |
Sleep-time Compute: Beyond Inference Scaling at Test-time | |
MetaSynth: Meta-Prompting-Driven Agentic Scaffolds for Diverse Synthetic Data Generation | |
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding | |
Can Pre-training Indicators Reliably Predict Fine-tuning Outcomes of LLMs? | |
Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time | |
Reinforcement Learning from Human Feedback | |
Reconstructing Sepsis Trajectories from Clinical Case Reports using LLMs: the Textual Time Series Corpus for Sepsis | |
LLMTaxo: Leveraging Large Language Models for Constructing Taxonomy of Factual Claims from Social Media | |
MIB: A Mechanistic Interpretability Benchmark | |
LitLLMs, LLMs for Literature Review: Are we there yet? | |
VEXP: A Low-Cost RISC-V ISA Extension for Accelerated Softmax Computation in Transformers | |
Vending-Bench: A Benchmark for Long-Term Coherence of Autonomous Agents | |
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? | |
MIG: Automatic Data Selection for Instruction Tuning by Maximizing Information Gain in Semantic Space | |
Generative AI Act II: Test Time Scaling Drives Cognition Engineering | |
Could Thinking Multilingually Empower LLM Reasoning? | |
Thought Manipulation: External Thought Can Be Efficient for Large Reasoning Models | |
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations |
drcuiu
commented
Feb 1, 2025
Actua como un radiologista y revisa la seguintes radiografias para mi: [ paciente de 6 años con crisis bronquiales repetitivas, con presencia de sibilancia desde hace 24h. Optimizalo para hacerlo lo mejor y haz las preguntas que consideres necesarias antes de proceder]
Act as a radiologist and review the following x-rays for me: [6-year-old patient with repetitive bronchial attacks, wheezing for 24 hours. Optimize it to do the best and ask any questions you feel necessary before proceeding]
Act as a radiologist and review the following x-rays for me: [6-year-old patient with repetitive bronchial attacks, wheezing for 24 hours. Optimize it to do the best and ask any questions you feel necessary before proceeding]
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment