王欣睿/Xinrui Wang

I am currently a Ph.D. candidate in Machine Learning at The University of Tokyo, Matsuo Lab. My work focuses on visual generative models, including image and video generation and editing, image-to-image translation, vision stylization, and efficient deep learning systems.

I expect to graduate in Fall 2026 and am actively seeking summer internship or full-time opportunities in visual generative modeling. Please feel free to contact me at secret_wang@outlook.com.

我目前在东京大学松尾研究室攻读人工智能博士。我的研究方向与经验主要关于视觉生成模型,包括图像与视频生成和编辑、图像到图像转换、视觉风格化以及移动端深度学习系统。

我预计将于 2026 年秋季毕业,目前正在积极寻找视觉生成模型方向的暑期实习或全职机会。欢迎通过 secret_wang@outlook.com 与我联系。

Portrait of Xinrui Wang
Google Scholar GitHub Homepage English Resume 中文简历

Publications

Enhancing Reference-based Sketch Colorization via Separating Reference Representations

*Dingkun Yan, *Xinrui Wang, Zhuoru Li, Suguru Saito, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo (* equal contribution)

arXiv 2025

Abstract: This work studies distribution shift in reference-based sketch colorization, where real references can be spatially or semantically misaligned with sketches. It separates reference representations into modular stages for embedding guidance, background detail transfer, and global style transfer, improving visual quality while reducing spatial artifacts.

Towards High-resolution and Disentangled Reference-based Sketch Colorization

*Dingkun Yan, *Xinrui Wang, *Ru Wang, Zhuoru Li, Jinze Yu, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo (* equal contribution)

CVPR 2026 Highlight

Abstract: The paper targets spatial entanglement in high-resolution sketch colorization. It introduces a dual-branch feature alignment framework with Gram regularization, plus anime-specific attribute control and texture-transfer modules, to improve resolution, controllability, and reference consistency.

One-shot Portrait Stylization via Geometric Alignment

Xinrui Wang, Zilin Guo, Zhuoru Li, Jinze Yu, Heng Zhang, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo

WACV 2026

Abstract: This work learns a portrait style from a single artistic reference. It combines geometric alignment, content and style LoRA optimization, orthogonal adaptation, and ControlNet guidance to produce high-quality stylized portraits with a smaller computation and parameter budget than common inversion or diffusion baselines.

Real-Time Data-efficient Portrait Stylization via Geometric Alignment

Xinrui Wang, Zhuoru Li, Xuanyu Yin, Xiao Zhou, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo

Neural Networks 2025

Abstract: The method builds geometric correlations between portrait photos and style samples using facial landmarks and differentiable thin-plate-spline alignment. Its lightweight GAN framework improves training data efficiency and computational cost, enabling real-time portrait stylization on mobile devices.

Image Referenced Sketch Colorization Based on Animation Creation Workflow

*Dingkun Yan, *Xinrui Wang, Zhuoru Li, Suguru Saito, Yusuke Iwasawa, Yutaka Matsuo, Jiaxian Guo (* equal contribution)

CVPR 2025

Abstract: Inspired by professional animation pipelines, this diffusion framework uses sketches for spatial guidance and RGB images for color reference. It separates foreground and background reference signals with spatial masks and split cross-attention LoRA modules to reduce artifacts from mismatched references.

Adaptive Inertia: Disentangling the Effects of Adaptive Learning Rate and Momentum

Zeke Xie, Xinrui Wang, Huishuai Zhang, Issei Sato, Masashi Sugiyama

ICML 2022

Abstract: This paper analyzes why Adam converges quickly but often generalizes worse than SGD. It separates the roles of adaptive learning rate and momentum in saddle-point escape and flat-minima selection, then proposes Adaptive Inertia, a parameter-wise adaptive momentum framework with strong empirical generalization.

Generating Manga from Illustrations via Mimicking Manga Creation Workflow

Lvmin Zhang, Xinrui Wang, Qingnan Fan, Yi Ji, Chunping Liu

CVPR 2021

Abstract: This framework converts digital illustrations into manga by mimicking studio workflows: line drawings, regular screentones, and irregular screen textures. The generated layers can be composed into manga images and further edited, supported by a large artist-annotated dataset.

Learning to Cartoonize Using White-Box Cartoon Representations

Xinrui Wang, Jinze Yu

CVPR 2020

Abstract: The paper introduces a controllable image cartoonization framework by decomposing cartoons into surface, structure, and texture representations. Separate learning objectives for each representation make the model adjustable across cartoon styles and artist requirements.

Working Experiences 工作经历

SoftBank Group SoftBank Group(日本软银集团)

Staff Machine Learning Engineer & Deputy Director 主任机器学习工程师 & 担当部长

Apr. 2025 - Mar. 2026 2025.04 - 2026.03
  • Worked as a director-level engineer and developed multiple internal AI and CV based functions.
  • 内部转岗到 SoftBank,担任部长级工程师,预研和应用多项基于 LLM、VLM 和 diffusion 的 AI 需求。

Japan Computer Vision

Staff Machine Learning Engineer 主任机器学习工程师

Aug. 2022 - Mar. 2025 2022.08 - 2025.03
  • Subsidiary of SoftBank group, focusing on face recognition. Built a face-recognition system from scratch.
  • Collected and cleaned a dataset of about 2.5M IDs and 70M images, optimized models and vector matching algorithm.
  • Overall performance outperformed Sensetime Anysee in multiple benchmarks.
  • JCV 是软银集团子公司,负责从零开始自研人脸识别算法,在多个内部应用场景测试中超越商汤 Anysee。
  • 收集并处理得到 250 万 ID 和 7000 万图像的数据集,多机多卡大规模迭代优化模型和 vector 匹配算法。

Tencent 腾讯(Tencent)

Senior Machine Learning Engineer 高级图像算法工程师

Aug. 2020 - May. 2022 2020.08 - 2022.05
  • Worked for WeChat group to develop functions for WeChat, an IM application with 1.2 billion DAU worldwide.
  • Developed face segmentation for auto makeup, achieved 95% mIoU and 300 FPS on smartphone.
  • Developed a GAN framework for facial editing, including style transfer, age changing and faceswap, on smartphone.
  • Developed old photos for children's day and video-drive image animation on smartphone.
  • 开发了微信视频号直播美妆的人脸分割算法,在移动端上线,达到 95% mIoU 和 300 FPS 运行速度。
  • 开发了用于智能手机端人脸编辑的 GAN 框架,风格迁移和年龄编辑可视频实时,换脸耗时 180ms。
  • 开发儿童节老照片、图片表情驱动生成视频等功能,在秒剪移动端上线,当日 DAU 增量超过百万。

ByteDance 字节跳动(ByteDance)

Machine Learning Engineer 图像算法工程师

Jan. 2019 - Jul. 2020 2019.01 - 2020.07
  • Worked in ByteDance AI Lab to develop algorithm for Douyin and TikTok, with 1.5 billion DAU worldwide.
  • Developed Super Resolution and HDR algorithms for smartphones, achieved 38+ PSNR.
  • Developed CNN based algorithms for face changing as a main contributor, runs real-time on smartphone.
  • Deeply involved in model training, quantization, mobile inference engine and deployment.
  • 开发移动超分辨率和 HDR 算法,PSNR 超过 38dB,在火山视频全量上线,每日节省流量费用 270 万元。
  • 深度参与模型训练、移动端推理引擎和移动端部署流程,负责模型设计、量化训练和格式转换。
  • 为 2020 年春节时光相册活动开发了基于 GAN 的人脸变年龄和编辑功能,抖音移动端全量上线。