While recent 3D generative models can produce high-quality texture images, they often fail to capture human preferences or meet task-specific requirements. Moreover, a core challenge in the 3D texture generation domain is that most existing approaches rely on repeated calls to 2D text-to-image generative models, which lack an inherent understanding of the 3D structure of the input 3D mesh object. To alleviate these issues, we propose an end-to-end differentiable, reinforcement-learning-free framework that embeds human feedback, expressed as differentiable reward functions, directly into the 3D texture synthesis pipeline. By back-propagating preference signals through both geometric and appearance modules of the proposed framework, our method generates textures that respect the 3D geometry structure and align with desired criteria. To demonstrate its versatility, we introduce three novel geometry-aware reward functions, which offer a more controllable and interpretable pathway for creating high-quality 3D content from natural language. By conducting qualitative, quantitative, and user-preference evaluations against state-of-the-art methods, we demonstrate that our proposed strategy consistently outperforms existing approaches.

Zamani, Amirhossein; Xie, Tianhao; Aghdam, Amir; Popa, Tiberiu; Belilovsky, Eugene

End-to-End Fine-Tuning of 3D Texture Generation using Differentiable Rewards

Amirhossein Zamani^1,2*, Tianhao Xie², Amir Aghdam², Tiberiu Popa², Eugene Belilovsky^1,2

¹ Mila – Quebec AI Institute; ² Concordia University, Montreal, Canada

IEEE/CFV Winter Conference on Applications of Computer Vision (WACV) 2026

^*Corresponding Author

Paper Code arXiv

1- Camera-Pose-Aware Reward:

Qualitative results of the camera-pose-aware texture generation experiment for different text prompts on different 3D mesh objects. The objective of this experiment is to learn optimal camera viewpoints such that, when the object is rendered and textured from these views, the resulting texture maximizes the average aesthetic reward. Consequently, by maximizing this reward, the model becomes invariant to the initial camera positions: regardless of where the cameras start, the training will adjust their azimuth and elevation to surround the 3D object in a way that yields high‑quality, aesthetically pleasing textures.

2- Symmetry-Aware Texture Generation Reward:

Qualitative results of symmetry-aware texture generation experiment on a balloon mesh object. For each rwo (after and before fine-tuning), we show the rendered 3D object from multiple viewpoints, alongside the corresponding texture images (rightmost column), which highlight the symmetric regions. A vertical dashed line marks the symmetry axis in each texture image. The purple plane passing through the center of the balloon in each viewpoint indicates the estimated symmetry plane of the object. As shown, compared to the pre-trained model, our method generates textures that are more consistent across symmetric parts of the mesh. Without symmetry supervision, patterns often differ noticeably between sides. In contrast, textures trained with the proposed symmetry reward exhibit visually coherent features across symmetric regions, demonstrating the reward’s effectiveness in enforcing symmetry consistency.

3- Geometry-Texture Alignment Reward:

Qualitative results of the geometry-texture alignment experiment on a rabbit (bunny) mesh. For each row (before and after fine-tuning), we show the rendered 3D object from multiple viewpoints, with the corresponding texture image in the rightmost column. As shown, our method produces textures whose patterns align more closely with the mesh’s curvature directions, unlike the pre-trained model. Moreover, a notable outcome in our results is the emergence of repetitive texture patterns after fine-tuning with the geometry-texture alignment reward. This behavior arises from the differentiable sampling strategy used during reward computation. Specifically, it encourages the model to place edge features at specific UV coordinates which ultimately results in structured and repeated patterns in the texture.

4- Texture-Features Emphasis Reward:

Qualitative results of the texture features emphasis experiment on a rabbit (bunny) object. For each row (before and after fine-tuning), we show the rendered 3D object from multiple viewpoints, with the corresponding texture image in the rightmost column. The goal of this experiment is to learn texture images with salient features (e.g., edges) emphasized at regions of high surface bending, represented by the magnitude of mean curvature. This encourages texture patterns that highlight 3D surface structure while preserving perceptual richness through color variation. As illustrated, our method enhances texture features, such as edges and mortar, in proportion to local curvature, a capability the pre-trained model lacks, often resulting in pattern-less (white) areas, particularly on the back and head of the rabbit.

Abstract

While recent 3D generative models can produce high-quality texture images, they often fail to capture human preferences or meet task-specific requirements. Moreover, a core challenge in the 3D texture generation domain is that most existing approaches rely on repeated calls to 2D text-to-image generative models, which lack an inherent understanding of the 3D structure of the input 3D mesh object. To alleviate these issues, we propose an end-to-end differentiable, reinforcement-learning-free framework that embeds human feedback, expressed as differentiable reward functions, directly into the 3D texture synthesis pipeline. By back-propagating preference signals through both geometric and appearance modules of the proposed framework, our method generates textures that respect the 3D geometry structure and align with desired criteria. To demonstrate its versatility, we introduce three novel geometry-aware reward functions, which offer a more controllable and interpretable pathway for creating high-quality 3D content from natural language. By conducting qualitative, quantitative, and user-preference evaluations against state-of-the-art methods, we demonstrate that our proposed strategy consistently outperforms existing approaches.

Methodology

An overview of the proposed training process, consisting of two main stages: (i) texture generation, where a latent diffusion model generates high-quality images from textual prompts. Combined with differentiable rendering and 3D vision techniques, this step produces realistic textures for 3D objects. (ii) texture reward learning, where an end-to-end differentiable pipeline fine-tunes the pre-trained text-to-image diffusion model by maximizing a differentiable reward function r. Gradients are back-propagated through the entire 3D generative pipeline, making the process inherently geometry-aware. To demonstrate the method’s effectiveness in producing textures aligned with 3D geometry, we introduce five novel geometry-aware reward functions.

Comparative Results

Qualitative Comparison

Quantitative Comparison

More Results

BibTeX

@article{zamani2025geometry,    
  title={Geometry-Aware Preference Learning for 3D Texture Generation},
  author={Zamani, AmirHossein and Xie, Tianhao and Aghdam, Amir G and Popa, Tiberiu and Belilovsky, Eugene},
  journal={arXiv preprint arXiv:2506.18331},
  year={2025},
  url={https://ahhhz975.github.io/DifferentiableTextureLearning/}
}