A collection of papers that I found useful. The section (vision)Transformer and Crowd counting represent the argument for my master thesis.
The papers are split by arguments, but they are unordered.
TOC
Papers
(vision)Transformer
- Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions — link
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale — link
- Twins: Revisiting the Design of Spatial Attention in Vision Transformers — link
- Attention Is All You Need — link
- VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text — link
- Transformer in Transformer — link
- Not a paper but it explain really well the concept. https://amaarora.github.io/2021/01/18/ViT.html
Crowd counting
- Distribution Matching for Crowd Counting — link
- Understanding the impact of mistakes on background regions in crowd counting — link
- PANet: Perspective-Aware Network with Dynamic Receptive Fields and Self-Distilling Supervision for Crowd Counting — link
- CCTrans: Simplifying and Improving Crowd Counting with Transformer — link
- Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework — link
- Learning from synthetic data for crowd counting in the wild — link
- Learning To Count Objects in Images — link
- Adaptive Density Map Generation for Crowd Counting— link
- CNN-based Density Estimation and Crowd Counting: A Survey— link
- Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting— link
- Tracking Pedestrian Heads in Dense Crowd — link
Misc
- Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition — link
- Multi-scale context aggregation by dilated convolutions — link
- CHoRaL: Collecting Humor Reaction Labels from Millions of Social
Media Users— link - Exploring Simple Siamese Representation Learning— link
- Signature Verification using a “Siamese” Time Delay Neural Network— link
- Dropblock: A regularization method for convolutional networks — link
- FaceNet: A Unified Embedding for Face Recognition and Clustering— link
- Generative Adversarial Networks— link
- Deep Residual Learning for Image Recognition — link
- Very Deep Convolutional Networks for Large-Scale Image Recognition — link
- You Only Look Once: Unified, Real-Time Object Detection — link
- YOLO9000: Better, Faster, Stronger — link
- YOLOv3: An Incremental Improvement — link
- YOLOv4: Optimal Speed and Accuracy of Object Detection — link
- Scaled-YOLOv4: Scaling Cross Stage Partial Network — link
- Cspnet: A new backbone that can enhance learning capability of cnn — link
Books
- Deep Learning with python — Francois Chollet
- Deep learning — Ian Goodfellow, Yoshua Bengio, Aaron Courville
- Data Science for Business
- An Introduction to 3d Computer Vision Techniques and Algorithms
- Practical Deep learning for Cloud, Mobile & Edge
- Machine Learning for Time Series Forecasting with python
- Mastering Machine Learning on AWS
- Multiple View Geometry in computer vision
Others
https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap