Git config I like to have a global git config which takes care of my usual git setup like typical commands and abbreviations I use, my username and my email address. It can be helpful to adjust some of this information for a local project, e.g. when you are normally having your regular email address setup, but in one of the local folders you develop for a company you work for and you want to have your work email address instead.
Bash string manipulation When I write bash scripts in my terminal, I often need to manipulate strings. Unfortunately, I often forget how to do this properly in bash, so I thought I’d write a blog article for me to remember better in the future. Hopefully it will be helpful for some of you developers out there as well. String manipulation in bash is not hard, but I find some of the notation a bit cumbersome especially when normally working more with Python or other languages.
Label noise introduction Training machine learning models requires a lot of data. Often, it is quite costly to obtain sufficient data for your problem. Sometimes, you might even need domain experts which don’t have much time and are expensive. One option that you can look into is getting cheaper, lower quality data, i.e. have less experienced people annotate data. This usually has the side effect of your labels becoming more noisy.
Instead of using sometimes confusing indexing in your code, use a namedtuple instead. It’s backwards compatible, so you can still use the index, but you can make your code much more readable. This is especially helpful when you transform between PIL and numpy based code, where PIL uses a column, row notation while numpy uses a row, column notation. Let’s consider this piece of code where we want to get the pixel locations of several points which are in the numpy format:
Einops is a really great library to improve your machine learning code. It supports Numpy, PyTorch, Tensorflow and many more machine learning libraries. It helps to give more semantic meaning to your code and can also save you a lot of headaches when transforming data. As a primer let’s look at a typical use-case in machine learning where you have a bunch of data and you want to reshape it, so some dimensions are merged together like this:
Follow me on twitter! Follow @mpaepper