Oltau is a Fully Automated SaaS Platform to build Computer Vision Bots within minutes, without coding. Story of its formation dates back to 2017.
To understand the pain points of Enterprises in their AI Adoption journey, we decided to participate in Kaggle’s Data Science Bowl Competition.
In this competition, we were given a labeled training dataset of images of cells and we were required to detect the nuclei in those images. Over 3000 teams from all across the globe participated in this battle.
Many of these teams were well-funded startups already working on Object Detection in Images for years, many of them were Professors and Scientists from Renowned Universities.
When we found out about it, we were already late, it was only two months left for the submission. Others were working for over 6 months.
We trained our model and submitted our first built, and we got placed at over 1400 Rank. Mayank, the God Mathematician in our team, studied all the models available and zeroed in on four models to be tested, apart from the one we already trained.
Nishant, the God Techie in our team, trained all four models separately and submitted four different builds, and we got Ranks around 800, 1200, 600 & 700.
The model which got us the best Rank of 600 was Mask R-CNN.
Kaggle has a forum where all teams discuss their models and approaches. We found out that most of the teams were using the same Mask R-CNN model and, the guy who created this model was also participating in this competition. The irony was, many of the teams were giving better accuracy than this guy using his own model. Remember, everyone had access to the same training data which was already labeled.
So, neither the model, nor the training data, nor the labeling was affecting the results significantly — what else was playing the role here? … It was the million dollar question, that piqued our interest.
Moreover, the Ranks were based on the efficiency of the models submitted by the teams and the efficiencies ranged from 20% to around 60%. There are frameworks available, like Tensorflow, which allow you to import any model you wish, upload your training data, and train the model, requiring only a very basic level of coding. In a way, even a monkey can build an AI Solution which will give a 20% efficiency, and he may claim to be an AI developer or a Data Scientist in his resume.
We dug a bit deeper into the working of the Neural Networks.
There are hundreds of Models available and each model comes with hundreds of hyper-parameters — like Activation Functions, and each hyper-parameter can take, on the average, a hundred values. Thus there are already 100x100x100 i.e., a million combinations possible.
Over the model, you need to choose some architecture, like Resnet, Alexnet, etc, there are hundreds of them. Each architecture can have any number of layers.
And then you need to do iterations while training the model. In that competition, there were only 650 images in the training set, so it used to take only about 5 minutes per iteration on a GPU. And more the iterations, better the efficiency — was not the case. Many a time we found efficiency was good after 93 iterations, it improved after 94 iterations, and then fell significantly after 95 iterations. So if you are going with brute force you need to check the results after every iteration. Max number of iterations we could test in those 2 months was 200. There were teams which did 12,000 iterations.
You can also choose to train layer-wise to monitor its effect on efficiency. You may choose to train only head layers initially.
Also, there’s something called learning rate. It can take any value between 0 and 1. Higher the rate it’ll take lesser time in the iteration but will learn only high-level features, lower the rate and it’ll take an almost infinite time to train, though it’ll be learning even the very fine nuances.
I guess you would have got a fair idea by now as to how difficult it is to find that one set of parameters which will give the best efficiency. If you go by hit and trial, you’ll practically end up with almost infinite possibilities to test.
In this competition, the training dataset was very small, later in few projects, we had to train on thousands of images and it used to take 3–4 days in testing one iteration on GPU.
If you understand well the mathematics behind the neural nets, which by the way is really complex, and you know exactly how they work, and you are intelligent enough to connect the dots between the nature of the problem at hand and the parameters to be tuned, only then you can narrow down to possibilities within a practically manageable range.
OR, there’s a possibility where someone has an experience of thousands of hours in tuning the models, his hindbrain might have learned the patterns subconsciously and so based on his intuition he can get good results. But keep it in mind that before 2015 ImageNet Results, no one was working on deploying Neural Nets barring few believers and that too only on straight forward use-cases. So no one has that kind of experience on complex cases.
Finally, we ended up at 54 Rank in that competition out of 3000+ teams from all across the globe.
We now talked to a few friends in the field and found out that all those who were working on software and analytics had started offering AI services. The market got flooded with Vendors, but when we talked to Enterprises we found out that rarely any project moved beyond PoC, as none was giving efficiency worth deployable. It was a sad state, as Enterprises were getting apprehensive as to whether AI really works or it’s just a hype.
We met a company which was working on building a solution which can read hand-filled forms. They tried all the solutions available in the market including OCR/ICR but none worked. Few Deep Learning based solutions too were available but they too failed to give any good efficiency. Their data was too noisy, they were photocopied forms scanned on not very good scanners. Moreover, it was for a big financial Institution and they couldn’t afford more than a limited number of inaccuracies. They could deploy the solution only if it reaches almost human level efficiency.
They asked us for help.
From first glance, this problem appeared impossible to crack. We said, ‘we can try but we are not promising anything. We are not sure whether the technology to achieve this feat is even available.’
They nevertheless decided to work with us.
We worked on it for six months, tried all tools available in the market, we even tried Google and Microsoft’s Vision API but efficiency was not moving upwards of 80% at field level. We tried thousands of combinations of iterations and learning rate (they gave us access to unlimited GPU). We were using CapsNet, which we found was best suited to this type of problem and we were using publicly available MNIST dataset having 60,000 images for training. But to no avail. We were about to give up.
Then one fine day, when in despair, it occurred to us — “Every Company will always carry a particular kind of bias in its data which will always be unique to it“. We do not need bigger data but we need biased data to uniquely train the model on its peculiar biases.
We asked them to train the model on their own data. It was an expensive task on their part, but somehow they trusted us and put a few of their people to annotate their data. They created a dataset of 26,000 images. We replaced MNIST with their own dataset and boom — efficiency crossed 90% at field level and 98% at the character level.
Their human team of around 2000 operators was giving the efficiency of 86%, the QC team takes that to 99%. Now they do not need the human team and pressure on QC team too has decreased.
We met many other Enterprises which were struggling with their AI adoption.
And we got our idea — It’s difficult for humans to tune the models and configure pipelines as per the requirements, but not so difficult for AI. It’s just like chess, there is an almost infinite number of moves possible but AI can find almost the best possible move in real time. The idea was to use AI to deploy AI. And this way, as the Enterprises won’t require deep technical skills, they can train their models on their own data using our AI.
And we launched Oltau.
Oltau is a platform where you can build such solutions without coding, and you’ll get the best of the efficiency.