Song Park
  • Bio
  • Publications
  • Experience
  • Recent & Upcoming Talks
    • Example Talk
  • Publications
    • DNNs May Determine Major Properties of Their Outputs Early, with Timing Possibly Driven by Bias
    • MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation
    • Probabilistic Language-Image Pre-Training
    • Seeing What You Say: Expressive Image Generation from Speech
    • Rotary position embedding for vision transformer
    • SeiT++: Masked Token Modeling Improves Storage-efficient Training
    • Similarity of neural architectures using adversarial attack transferability
    • SeiT: Storage-efficient vision training with tokens using 1% of pixel storage
    • TADA: Timestep-Aware Data Augmentation for Diffusion Models
    • Eccv caption: Correcting false negatives by collecting machine-and-human-verified image-caption associations for ms-coco
    • Few-shot font generation with weakly supervised localized representations
    • Few-shot font generation with localized style representations and factorization
    • Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts
    • Styleaugment: Learning texture de-biased representations by style augmentation without pre-defined textures
    • Semantic-aware neural style transfer
    • Face generation for low-shot learning using generative adversarial networks
  • Projects
  • Blog
    • ๐ŸŽ‰ Easily create your own simple yet highly customizable blog
    • ๐Ÿง  Sharpen your thinking with a second brain
    • ๐Ÿ“ˆ Communicate your results effectively with the best data visualizations
    • ๐Ÿ‘ฉ๐Ÿผโ€๐Ÿซ Teach academic courses
    • โœ… Manage your projects
  • Projects
    • Pandas
    • PyTorch
    • scikit-learn
  • Experience
  • Teaching
    • Learn JavaScript
    • Learn Python

Seeing What You Say: Expressive Image Generation from Speech

Jan 1, 2025ยท
Jiyoung Lee
Song Park
Song Park
,
Sanghyuk Chun
,
Soo-Whan Chung
ยท 0 min read
PDF Cite
Type
Conference paper
Publication
Proceedings of the IEEE international conference on computer vision workshops
Last updated on Jan 1, 2025
Song Park
Authors
Song Park
Research Scientist

← Probabilistic Language-Image Pre-Training Jan 1, 2025
Rotary position embedding for vision transformer Jan 1, 2024 →

ยฉ 2026 Song Park. This work is licensed under CC BY NC ND 4.0

Published with Hugo Blox Builder โ€” the free, open source website builder that empowers creators.