Machine learning papers

2 min readDec 29, 2021

A collection of papers that I found useful. The section (vision)Transformer and Crowd counting represent the argument for my master thesis.

The papers are split by arguments, but they are unordered.

Papers

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions — link
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale — link
Twins: Revisiting the Design of Spatial Attention in Vision Transformers — link
Attention Is All You Need — link
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text — link
Transformer in Transformer — link
Not a paper but it explain really well the concept. https://amaarora.github.io/2021/01/18/ViT.html

Distribution Matching for Crowd Counting — link
Understanding the impact of mistakes on background regions in crowd counting — link
PANet: Perspective-Aware Network with Dynamic Receptive Fields and Self-Distilling Supervision for Crowd Counting — link
CCTrans: Simplifying and Improving Crowd Counting with Transformer — link
Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework — link
Learning from synthetic data for crowd counting in the wild — link
Learning To Count Objects in Images — link
Adaptive Density Map Generation for Crowd Counting— link
CNN-based Density Estimation and Crowd Counting: A Survey— link
Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting— link
Tracking Pedestrian Heads in Dense Crowd — link

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition — link
Multi-scale context aggregation by dilated convolutions — link
CHoRaL: Collecting Humor Reaction Labels from Millions of Social
Media Users— link
Exploring Simple Siamese Representation Learning— link
Signature Verification using a “Siamese” Time Delay Neural Network— link
Dropblock: A regularization method for convolutional networks — link
FaceNet: A Unified Embedding for Face Recognition and Clustering— link
Generative Adversarial Networks— link
Deep Residual Learning for Image Recognition — link
Very Deep Convolutional Networks for Large-Scale Image Recognition — link
You Only Look Once: Unified, Real-Time Object Detection — link
YOLO9000: Better, Faster, Stronger — link
YOLOv3: An Incremental Improvement — link
YOLOv4: Optimal Speed and Accuracy of Object Detection — link
Scaled-YOLOv4: Scaling Cross Stage Partial Network — link
Cspnet: A new backbone that can enhance learning capability of cnn — link