tyh382596868
's Collections
daily paper
updated
ReVSeg: Incentivizing the Reasoning Chain for Video Segmentation with Reinforcement Learning
Paper
•
2512.02835
•
Published
•
9
Joint 3D Geometry Reconstruction and Motion Generation for 4D Synthesis from a Single Image
Paper
•
2512.05044
•
Published
•
16
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Paper
•
2512.05591
•
Published
•
16
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling
Paper
•
2512.05343
•
Published
•
24
World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty
Paper
•
2512.05927
•
Published
•
11
Voxify3D: Pixel Art Meets Volumetric Rendering
Paper
•
2512.07834
•
Published
•
43
EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing
Paper
•
2512.06065
•
Published
•
28
Vector Quantization using Gaussian Variational Autoencoder
Paper
•
2512.06609
•
Published
•
1
Relational Visual Similarity
Paper
•
2512.07833
•
Published
•
24
Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Paper
•
2512.08765
•
Published
•
128
OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory
Paper
•
2512.07802
•
Published
•
43
ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models
Paper
•
2512.07843
•
Published
•
21
TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models
Paper
•
2512.08153
•
Published
•
7
SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos
Paper
•
2512.08406
•
Published
•
2
MoCapAnything: Unified 3D Motion Capture for Arbitrary Skeletons from Monocular Videos
Paper
•
2512.10881
•
Published
•
29
Evaluating Gemini Robotics Policies in a Veo World Simulator
Paper
•
2512.10675
•
Published
•
17
Qwen3-VL Technical Report
Paper
•
2511.21631
•
Published
•
148
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper
•
2512.02556
•
Published
•
244
From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence
Paper
•
2511.18538
•
Published
•
282
PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing
Paper
•
2512.02589
•
Published
•
68
Deep Forcing: Training-Free Long Video Generation with Deep Sink and Participative Compression
Paper
•
2512.05081
•
Published
•
30
RoboTracer: Mastering Spatial Trace with Reasoning in Vision-Language Models for Robotics
Paper
•
2512.13660
•
Published
•
36
MMGR: Multi-Modal Generative Reasoning
Paper
•
2512.14691
•
Published
•
114
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives
Paper
•
2512.14699
•
Published
•
27
Paper
•
2512.13961
•
Published
•
22
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed
Paper
•
2512.14067
•
Published
•
13
The World is Your Canvas: Painting Promptable Events with Reference Images, Trajectories, and Text
Paper
•
2512.16924
•
Published
•
25
Trainable Log-linear Sparse Attention for Efficient Diffusion Transformers
Paper
•
2512.16615
•
Published
•
4
JustRL: Scaling a 1.5B LLM with a Simple RL Recipe
Paper
•
2512.16649
•
Published
•
24
N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models
Paper
•
2512.16561
•
Published
•
19
PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence
Paper
•
2512.16793
•
Published
•
72
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times
Paper
•
2512.16093
•
Published
•
91
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models
Paper
•
2512.20557
•
Published
•
49
Nemotron 3 Nano: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
Paper
•
2512.20848
•
Published
•
30
NVIDIA Nemotron 3: Efficient and Open Intelligence
Paper
•
2512.20856
•
Published
•
29
4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation
Paper
•
2512.17012
•
Published
•
42