Machine learning papers

Francesco
2 min readDec 29, 2021

A collection of papers that I found useful. The section (vision)Transformer and Crowd counting represent the argument for my master thesis.

The papers are split by arguments, but they are unordered.

TOC

Papers

(vision)Transformer

  • Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions link
  • An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale link
  • Twins: Revisiting the Design of Spatial Attention in Vision Transformerslink
  • Attention Is All You Need — link
  • VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text — link
  • Transformer in Transformer — link
  • Not a paper but it explain really well the concept. https://amaarora.github.io/2021/01/18/ViT.html

Crowd counting

  • Distribution Matching for Crowd Counting — link
  • Understanding the impact of mistakes on background regions in crowd counting — link
  • PANet: Perspective-Aware Network with Dynamic Receptive Fields and Self-Distilling Supervision for Crowd Counting — link
  • CCTrans: Simplifying and Improving Crowd Counting with Transformerlink
  • Rethinking Counting and Localization in Crowds:A Purely Point-Based Framework — link
  • Learning from synthetic data for crowd counting in the wild — link
  • Learning To Count Objects in Images — link
  • Adaptive Density Map Generation for Crowd Counting— link
  • CNN-based Density Estimation and Crowd Counting: A Survey— link
  • Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting— link
  • Tracking Pedestrian Heads in Dense Crowd — link

Misc

  • Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition — link
  • Multi-scale context aggregation by dilated convolutions — link
  • CHoRaL: Collecting Humor Reaction Labels from Millions of Social
    Media Users— link
  • Exploring Simple Siamese Representation Learning— link
  • Signature Verification using a “Siamese” Time Delay Neural Network— link
  • Dropblock: A regularization method for convolutional networks — link
  • FaceNet: A Unified Embedding for Face Recognition and Clustering— link
  • Generative Adversarial Networks— link
  • Deep Residual Learning for Image Recognition — link
  • Very Deep Convolutional Networks for Large-Scale Image Recognition — link
  • You Only Look Once: Unified, Real-Time Object Detection — link
  • YOLO9000: Better, Faster, Stronger — link
  • YOLOv3: An Incremental Improvement — link
  • YOLOv4: Optimal Speed and Accuracy of Object Detection — link
  • Scaled-YOLOv4: Scaling Cross Stage Partial Network — link
  • Cspnet: A new backbone that can enhance learning capability of cnn — link

Books

  • Deep Learning with python — Francois Chollet
  • Deep learningIan Goodfellow, Yoshua Bengio, Aaron Courville
  • Data Science for Business
  • An Introduction to 3d Computer Vision Techniques and Algorithms
  • Practical Deep learning for Cloud, Mobile & Edge
  • Machine Learning for Time Series Forecasting with python
  • Mastering Machine Learning on AWS
  • Multiple View Geometry in computer vision

Others

https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap

--

--

Francesco

Master’s degree in Computer Engineering for Robotics and Smart Industry — Smart Systems & Data Analytics