Canonicalization Objaverse LIVS

             One-shot 3D Object Canonicalization based on Geometric and Semantic
                Consistency

2025 CVPR highlight

Li Jin^1,7

Yujie Wang^3,4

Wenzheng Chen^5,6

Qiyu Dai³

Qingzhe Gao³

Xueying Qin^1,7

Baoquan Chen^2*

¹School of Software, Shandong University

²State Key Laboratory of General Artificial Intelligence, Peking University

³School of Intelligence Science and Technology, Peking University

⁴UNC Chapel Hill

⁵Wangxuan Institute of Computer Technology, Peking University

⁶State Key Laboratory of Multimedia Information Processing, Peking University, Beijing, P.R. China

⁷Engineering Research Center of Digital Media Technology, Ministry of Education, P.R. China

Toolkit

Labels

Paper

3D object canonicalization is a fundamental task, essential for a variety of downstream tasks. Existing methods rely on either cumbersome manual processes or priors learned from extensive, per-category training samples. Real-world datasets, however, often exhibit long-tail distributions, challenging existing learning-based methods, especially in categories with limited samples. We address this by introducing the first one-shot category-level object canonicalization framework, requiring only a single canonical model as a reference (the "prior model") for each category. To canonicalize any object, our framework first extracts semantic cues with large language models (LLMs) and vision-language models (VLMs) to establish correspondences with the prior model. We introduce a novel joint energy function to enforce geometric and semantic consistency, aligning object orientations precisely despite significant shape variations. Moreover, we adopt a support-plane strategy to reduce search space for initial poses and utilize a semantic relationship map to select the canonical pose from multiple hypotheses. Extensive experiments on multiple datasets demonstrate that our framework achieves state-of-the-art performance and validates key design choices. Using our framework, we create the Canonical Objaverse Dataset (COD), canonicalizing 32K samples in the Objaverse-LVIS dataset, underscoring the effectiveness of our framework on handling large-scale datasets.

Method Overview

Our approach enables category-level object canonicalization using a single prior model for each category. We begin by utilizing large language models (LLM) and vision-language models (VLM) to capture the 3D semantics of both the prior model and the test model, establishing semantic correspondences (left). Next, we generate canonical pose hypotheses and introduce a joint energy function that integrates semantic and geometric cues, facilitating accurate alignment with the prior model (middle). Finally, we identify the optimal canonical pose using a semantic relationship map (right) by evaluating the consistency of semantic positions.

描述图片

Comparison with Canonical Dataset

Compared to existing datasets, COD features the largest number of categories and shapes. More importantly, we obtain 33k valid data with just two annotators over approximately eight hours, completed the alignment of the 40k shape dataset, which demonstrates capability to handle larger-scale datasets. Next, we will process the Objaverse-1.0 dataset, which contains 800k shapes.

描述图片

Data Statistics and Analysis

The figure below compares the Objaverse-LVIS dataset before and after applying our canonicalization method. Before canonicalization, only 24% of the objects were properly aligned. Following our processing, the proportion of canonicalized data increased by 55%, highlighting the effectiveness of our approach. Subsequently, we created the Canonical Objaverse Dataset (COD) by extracting 79% of canonical objects from the Canonical Objaverse-LVIS Dataset.

Visual Comparisons

We propose a one-shot canonicalization method based on the semantic and support information, enabling the canonicalization of 3D objects within the same category, even in the presence of significant differences in shape and appearance. The term "Initial" refers to the original objects, while "Canonical" denotes the canonical objects. The terms "Objects" and "semantics" represent the shape and meaning of the objects, respectively.