Menu

#Multimodal

15 posts

Feed·
15 of 15 posts
ByteDance Open-Sources BAGEL: 7B Multimodal Model for Image Gen, Editing, Understanding
🖼️
0

ByteDance Open-Sources BAGEL: 7B Multimodal Model for Image Gen, Editing, Understanding

DEV Community: deeplearning·gentic news·3 days ago
#WCyyNMm1
#dev#models#bagel#bytedance#multimodal#model

ByteDance open-sourced BAGEL, a 7B multimodal model for image gen, editing, style transfer, and understanding under Apache 2.0.

15s
Read More
Gemini API File Search: Enhanced Multimodal Capabilities with Embedding 2, Including Open-Source LINE Bot Implementation
🖼️
0

Gemini API File Search: Enhanced Multimodal Capabilities with Embedding 2, Including Open-Source LINE Bot Implementation

DEV Community·Evan Lin·21 days ago
#zHqbIXtX
#pitfall#comment#api#gemini#file#multimodal

From Dev.to - api: Gemini API File Search: Enhanced Multimodal Capabilities with Embedding 2, Including Open-Source LINE Bot Implementation

15s
Read More
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents
🖼️
0

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

arXiv.org·[Submitted on 29 Apr 2026]·27 days ago
#OJSFQhoz
#arxiv#wang#multimodal#zhang#yang#turbo

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive,…

15s
Read More
DeepSeek Finally "Opens Its Eyes": Multimodal Image Recognition Goes Live, the Last Missing Piece for Chinese LLMs
🖼️
0

DeepSeek Finally "Opens Its Eyes": Multimodal Image Recognition Goes Live, the Last Missing Piece for Chinese LLMs

DEV Community·蔡俊鹏·about 1 month ago
#3qr5Qt4B

From Dev Community: DeepSeek Finally "Opens Its Eyes": Multimodal Image Recognition Goes Live, the Last Missing Piece for Chinese LLMs

15s
Read More
Nvidia is no longer just selling the shovels. Nemotron 3 Nano Omni is the company’s most aggressive move into AI models.
🖼️
0

Nvidia is no longer just selling the shovels. Nemotron 3 Nano Omni is the company’s most aggressive move into AI models.

The Next Web·Alina Maria Stan·about 1 month ago
#b3MhJ2aU

Nvidia released Nemotron 3 Nano Omni on Tuesday, an open-weight multimodal AI model that unifies vision, audio, and language understanding in a single architecture designed to power autonomous AI agents on edge devices.…

15s
Read More
Applying Multimodal Biological Foundation Models Across Therapeutics and Patient Care
📰
0

Applying Multimodal Biological Foundation Models Across Therapeutics and Patient Care

DEV Community·Icarax·about 1 month ago
#hoMa7azJ
#wiredai#llms#ai#multimodal#learning#biofms

From Dev.to - machinelearning: Applying Multimodal Biological Foundation Models Across Therapeutics and Patient Care

15s
Read More
FACTS Benchmark Suite: Systematically evaluating the factuality of large language models
📰
0

FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

Google DeepMind·The FACTS team·about 1 month ago
#R2PU6XJp

The FACTS Benchmark Suite provides a systematic evaluation of Large Language Models (LLMs) factuality across three areas: Parametric, Search, and Multimodal reasoning.

15s
Read More