Design Patterns – An introduction

Design patterns are a quintessential topic that all entry level software developers are intimidated by. This post is my attempt at explaining the concepts in a simple way. I will avoid going into detailed examples since there are a lot of great linksvideos and books that do that well. I’d like to explain the core ideas behind design patterns in a concise way. This post will provide a high level summary about the types of design patterns and when they are used.



A design pattern is a general, reusable solution to a commonly occurring problem software design.
There are three main categories design patterns can be divided into:
  1. Creational
  2. Structural
  3. Behavioral


They speed up development by reusing existing paradigms (tested, proven to be efficient).
int main(){



Drawing a picture of a few objects takes time. But if we use stencils of objects to compose the drawing, it can be done much faster.
A design pattern is like a template (stencil) to solve problems in software development.


Creational patterns

are used for creating instances of classes and objects. They have two main ideas
  1. Encapsulate the information about which concrete class to use at creation
  2. Hide the nitty gritty details of creation and composition of the concrete classes and objects
Object creational patterns delegate object creation to another object.
Class creational patterns use the principle of inheritance to delegate object creation to subclasses. Some patterns that fall in the category of creational patterns are:
  • Factory pattern allows a class to delegate instantiation to its subclasses
  • Abstract factory pattern provides an interface to create a family of objects without specifying the concrete classes
  • Builder pattern encapsulates object construction from object representation
  • Prototype pattern clones from a fully initialized instance
  • Singleton pattern ensures that a class has only one instance during program execution and provide a global point of access to that instance


Structural patterns

simplify composition of entities (their relationships and functionality). Some common patterns in this category are:
  • Adapter pattern matches interface of a class to what the client expects
  • Bridge pattern decouples the interface from implementation to provide ‘loose coupling’
  • Composite pattern is a tree structure of objects with the same interface. Some objects are simple, while others are complex but they all have a uniform interface
  • Decorator pattern provides a way to extend functionality of a class during runtime without changing the interface
  • Facade pattern provides a higher level simplified interface to a set of interfaces in a subsystem
  • Flyweight pattern enables sharing by allowing a lot of objects share common properties
  • Proxy pattern provides a placeholder for another object to control access


Behavioral patterns

provide flexibility in communication across objects. Some common patterns in this category are:

  • Chain of responsibility passes the request along a chain of receivers until an object handles it
  • Command pattern encapsulates a request into an object
  • Iterator pattern provides a way to access the elements of a collection of objects without exposing the underlying details
  • Mediator pattern defines how different objects interact without referring to each other explicitly
  • Null object pattern provides a default object to avoid null references
  • Observer pattern defines a one to many dependency across objects so that a state change in the one object causes a notification to be sent to all it’s dependents
  • State pattern allows an object to change behavior when there is an internal state change
  • Strategy pattern defines a family of algorithms that are encapsulated to be interchangeable. Thus the algorithm can vary independently of the clients that use them
  • Template pattern lets subclasses redefine (override) some (specific) steps of an algorithm without affecting the high level structure of that algorithm
  • Visitor pattern allows a new operation to be defined without changing the class definition of the entities on which it operates

Deep Learning

Recently the term deep learning has become a buzz word in the field of technology. People are excited about using deep learning to develop products that solve problems (including those in the field of Artificial Intelligence a.k.a AI) which seemed impossible (or too futuristic) in the past.
A few days ago, there was news about how a computer system trained to play Go using deep learning beat a top human Go player that caught my attention. Until now Go (unlike chess) was a game that computers couldn’t always beat good human players at. So I was intrigued. What is this deep learning? I set out to figure it out using some papers, lectures and this book.

“If you can’t explain it simply, you don’t understand it well enough.”

Albert Einstein.

So this post is an attempt to explain some of these concepts and to ensure that I understand them well.
Let’s start with concisely stating what deep learning is and then go into further details.


Deep Learning is an approach to Machine Learning based on the knowledge of
  • learning algorithms
  • statistics
  • applied mathematics

Why now?

Deep Learning has recently been revived in popularity due to increased computation ability of CPUs and general purpose GPUs, huge amount of datasets (quite a bit of it labeled) and new techniques to train deeper neural networks.


Let’s take a step back to understand how deep learning fits with AI.
  • Classical algorithms in Computer Science map representations to output.
  • The goal of AI is to learn how to generate these representations in order to convert them to output, without human intervention.
  • There are many factors that humans use to understand / classify the observed input data into output.
  • Usually, humans are required to disentangle these features and remove the redundant ones to specify them as representations in classical algorithms.
Deep Learning organizes these features in a hierarchy where simpler concepts combine to generate a complex one.
Deep learning uses multiple layered neural networks to understand inputs and extract abstract concepts (a.k.a. features) from it. These abstract concepts are combined hierarchically to generate complex representations that in turn generate output.
It is also worth noting that Deep Learning does not attempt to simulate the human brain. Instead it takes the knowledge we have about learning hierarchical representations automatically from data and uses it to build computer systems that solve tasks requiring intelligence.

Minerva – an overview

This blog has not had any posts for a while. The main reasons are that I have been busy reading a lot of stuff (but did not have any additional comments on the books I read to blog about) and I have been working on a side project to get some experience building a machine learning framework from scratch. In the process of building it, I have learned a lot of things that I feel could be useful to someone trying a similar endeavor. So I have decided to blog about it.
The framework is called minerva (from the Roman goddess of wisdom). The source code can be found on my github page. My husband and I were looking for a project idea that had the following attributes:

  1. A generic framework for a classification system that could handle various types of data
  2. A high performance framework
  3. Easily scalable
  4. Implements some ideas that have gained interest in the field of ‘deep learning’ a.k.a automatic feature selection

We decided to implement a configurable neural network library from scratch in C++. We wanted this neural network to be able to perform automatic feature selection. After reading some research papers on automatic feature selection, the sparse autoencoder technique seemed promising enough to try out. Originally, the dataset we used for this project came from a Kaggle competition to perform multi-modal gesture recognition from videos. We used this dataset to setup and pipe-clean the flow (which supports images and video inputs). We tried our best to design a system with clean interfaces. We wanted to ensure that supporting different libraries (matrix, images etc) would be very seamless. This came in handy when we decided to add support to run minerva on GPUs. We simply plugged in support for GPU matrix libraries.

So, how do you use minerva? Well it isn’t as easy as pressing a button or running an executable / script (although I plan to add support for a simpler way by adding a wrapper script in the near future). But for now, there are four main steps.

  1. Creation of the neural network – In this step, we create an initial model of a neural network.
  2. Unsupervised Learning – In this step, we run the model from previous step with a lot of input data in ‘unsupervised learning’ mode. The result of this step is a feature selector neural network. More details follow.
  3. Supervised Learning – Run the feature selector model generated in above step in ‘supervised learning’ mode by running labeled training data. The result of this step is a classifier neural network.
  4. Classify (Test) – In this test we run test data on the classifier neural network and generate output labels for the test data.

1. Creation of the neural network model
The neural network is created with a configuration specified via. command line. It is then serialized and written to disk as a compressed tgz file. The file contains a json file describing the attributes like the number of layers, neurons of the neural network and a bunch of binary files containing randomly initialized matrices which combine together to form the neural network.

2. Unsupervised Learning
Unsupervised learning (or ‘deep learning’) is the process of automatically discovering patterns in large data sets without human interaction. Without unsupervised learning, we would need to manually select features that are indicators of how a particular input would be classified. This requires manually labeling thousands or millions of inputs (video frames) to perform classification (identify what an image contains). Additionally, this would need to be robust w.r.t. variations in color, alignment, and other noise. It is not feasible for most projects to devote the resources required to gather such a large number of labeled inputs.

In order to deal with this problem, many projects rely on building domain-specific feature selection systems. Such specific feature selectors preprocess the input data and produce a set of features which capture the essential information from the input data set into a significantly smaller representation. Feature selection reduces the dependence on labeled data by simplifying the problem to determining the class of a dataset using only the most relevant information instead of all information. Sift is an example of a widely used feature selection system. Most manually-crafted feature selection systems have a fatal flaw; they are designed by developers and tailored to specific problems. As a consequence, they are often brittle (because they are designed using developer / domain expert intuition), and require a complete redesign when moving from one type of input data to another (e.g. from video to text).

We decided to implement a sparse autoencoder technique for our unsupervised learning step. This technique attempts to address the shortcomings of feature selection systems by providing a framework to automatically generate features. At a high level, it uses an artificial neural network trained with unlabaled data and configured in a specific topology. The topology guides the inner layers of the network to respond to patterns in the input data such that the majority of information in the data is captured (essence of the data, so to speak). After training on unlabeled data, the inner layers of this network can be used directly as features; in other words, the network itself becomes a feature selection system.

The following image includes a visualization of some of the low level features (inputs that a pooling layer neuron maximally responds to) that were learned by Minerva after being presented 10,000 random images.


These compare favorably to Gabor Filter function responses in the top-left of the following figure, which have been found in the first layers of the visual cortex in mammalians, and perform well at visual classification tasks:

The accompanying figure on the top-right shows the corresponding spatial domain representation (impulse response) of each of the Gabor Filters.

We mainly referred to the following resources on deep learning during the course of this project:

3. Supervised Learning
In this step, we use the neural network from the earlier unsupervised learning stage as the starting point (instead of a randomly initialized neural network). At the start, this neural network is capable of selecting features. This training step uses labeled data with gesture names for images (which are snapshots of successive video frames). Thus, in this step, we fed in labeled data to this network and calibrate the network with back propagation and the expected output labels. Once the neural network output is within the specified threshold (tolerance), the resulting classifier neural network is written out to file. It is now ready to be used for testing with images.

The following image shows the neural network input that produces the maximum response for a neuron trained using Minerva to recognize images of cats.


4. Classification
In this step, we use both the neural networks viz. the feature selector from the unsupervised learning step and the classifier neural network from the supervised learning step to classify test images.

In the next few posts I plan to explain the details and rationale behind the design decisions we made to accomplish the four goals I mentioned at the beginning of this post. Here is how I tentatively plan to partition the topics:

  • Sparsity for unsupervised learning
  • Support for configurable convolutional neural networks (tiling the pixels to capture spatial locality)
  • Various principles to ensure that we truly built a high performance architecture from the start
  • Evaluation of various optimizers for the cost function calculation (required in calibrating the network in back propagation)
  • The various libraries for supporting framework viz. libraries for linear algebra , video, image, serialization.

What & why: machine learning?

I’ve been thinking of a topic/theme for this blog for a while now. For the last few months I have become interested in machine learning, and so have been trying to learn more about the topic. I figured it would be good exercise to summarize things I learn on this blog. They say a good understanding is a pre-requisite to explaining.

What is machine learning?

Machine learning is the science that deals with inferences. It uses statistical methods to make those inferences. These inferences can then be used for forecasting or predictions.

There are different types of forecasting viz. classification, regression.


The task of determining which one of the available classes, a variable belongs to, is classification. Think of these values as enumerated types. And usually these classes are disjoint. Eg: A system that detects which blood group a given sample belongs to, is a classification system. The sample can take one of the following values {A, B, AB, O} (let’s forget positive and negative for simplicity).


The task of determining the exact value a target variable can take, is regression. This variable can take any value (from the set of real numbers). Eg: A system that predicts the premium for your car insurance, or the cost of a house, or the interest rate on a loan.

Why machine learning?

Traditionally machines have been used to follow instructions, which means we still need to tell them what to do. The next step in the evolution of computing is to enable machines to ‘learn’ just like humans. We can leverage the recent explosion of data-sets (big data) and increased compute ability of processors to ‘teach’ machines to ‘think, predict, estimate, classify’ instead of just following our instructions (executing algorithms we write).

There are a number of ways to ‘teach’ machines. They are broadly categorized as ‘supervised learning’, ‘unsupervised learning’ and ‘reinforcement learning’.

  • Supervised learning: this is analogous to classroom learning for humans.
  • Unsupervised learning: this has no external intervention, the system ingests a lot of training samples and figures out what features matter. I will talk about this in depth in another post.
  • Reinforcement learning: this is a game theory based approach where the system is given rewards or punishments depending on it’s correct or incorrect predictions. The system modifies it’s prediction algorithm according to this feedback.

PS: I haven’t blogged for many years. I hope that the ‘flow’ of my posts improve as I continue blogging.

What are we rushing to?

We rush from the constant stream of emails, to reading feeds, checking notifications, watching videos (clips), sharing them, clicking photos, and checking emails that notify us about reactions what we shared. Why are we hurriedly sprinting through life like there is no tomorrow? Why are we consuming unfiltered media at a rate faster than ever before? Simply because it is available? Why do we share & consume a lot more than we create?

What happened to the days when we sat back & reflected on our thoughts, actions & feelings? When we wrote thoughts, essays & poems during random pondering. We ‘created’ them, didn’t we?

Well, this blog is my attempt at bringing those contemplative times back & creating something (be it a trivial blog post) instead of just being guilty of aforementioned mindless consumption.