YouTube nnabla channelの次の動画で利用したスライドです。
【DeepLearning研修】Transformerの基礎と応用 --第3回 Transformerの画像での応用
https://youtu.be/rkuayDInyF0
【参考文献】
・Deep Residual Learning for Image Recognition
https://arxiv.org/abs/1512.03385
・An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
https://arxiv.org/abs/2010.11929
・ON THE RELATIONSHIP BETWEEN SELF-ATTENTION AND CONVOLUTIONAL LAYERS
https://arxiv.org/abs/1911.03584
・Image Style Transfer Using Convolutional Neural Networks
https://ieeexplore.ieee.org/document/7780634
・Are Convolutional Neural Networks or Transformers more like human vision
https://arxiv.org/abs/2105.07197
・HOW DO VISION TRANSFORMERS WORK?
https://arxiv.org/abs/2202.06709
・Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
https://arxiv.org/abs/1610.02391
・Quantifying Attention Flow in Transformers
https://arxiv.org/abs/2005.00928
・Transformer Interpretability Beyond Attention Visualization
https://arxiv.org/abs/2012.09838
・End-to-End Object Detection with Transformers
https://arxiv.org/abs/2005.12872
・SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
https://arxiv.org/abs/2105.15203
・Training data-efficient image transformers & distillation through attention
https://arxiv.org/abs/2012.12877
・Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
https://arxiv.org/abs/2103.14030
・Masked Autoencoders Are Scalable Vision Learners
https://arxiv.org/abs/2111.06377
・Emerging Properties in Self-Supervised Vision Transformers
https://arxiv.org/abs/2104.14294
・Scaling Laws for Neural Language Models
https://arxiv.org/abs/2001.08361
・Learning Transferable Visual Models From Natural Language Supervision
https://arxiv.org/abs/2103.00020
・Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
https://arxiv.org/abs/2403.03206
・Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models
https://arxiv.org/abs/2402.17177
・SSII2024技術マップ
https://confit.atlas.jp/guide/event/ssii2024/static/special_project_tech_map
Related topics: