Semantic segmentation is a complex task for deep neural networks, especially when limited training data is available. Unlike image classification problems such as Imagenet, semantic segmentation requires a class prediction for every individual pixel rather than just an image-level class. This requires a high level of detail and can be difficult to achieve with limited labeled data. Obtaining labeled data for semantic segmentation is challenging, as it requires precise pixel annotation, which is time-consuming for humans.
Stable diffusion is all the rage in the deep learning community at the moment. It’s trending on Twitter at #stablediffusion and gaining large amounts of attention all over the internet. We’ll take a look into the reasons for all the attention to stable diffusion and more importantly see how it works under the hood by considering the well-written paper “High-resolution image synthesis with latent diffusion models” by Rombach et al which is the foundation of the system.
This is a follow-up to my previous post of Depthwise Separable Convolutions in PyTorch. This article is based on the nice CVPR paper titled “Rethinking Depthwise Separable Convolutions: How Intra-Kernel Correlations Lead to Improved MobileNets” by Haase and Amthor. Previously I took a look at depthwise separable convolutions which are a drop-in replacement for standard convolutions, but focused on computational and parameter-based efficiency. Basically, you can gain similar results with a lot less parameters and FLOPs, so they are used in MobileNet style architectures.
Today’s paper: Emerging properties in self-supervised vision transformers by Mathilde Caron et al. Let’s get the dinosaur out of the room: the name DINO refers to self-distillation with no labels. The self-distillation part refers to self-supervised learning in a student-teacher setup as is often seen for distillation. However, the catch is that in contrast to normal distillation setups where a previously trained teacher network is training a student network, here they work without labels and without pre-training the teacher.
Today’s paper: Rethinking ‘Batch’ in BatchNorm by Wu & Johnson BatchNorm is a critical building block in modern convolutional neural networks. Its unique property of operating on “batches” instead of individual samples introduces significantly different behaviors from most other operations in deep learning. As a result, it leads to many hidden caveats that can negatively impact model’s performance in subtle ways. This is a citation from the paper’s abstract and the emphasis is mine which caught my attention.