NokiMo
Dan Luu
Dan Luu

patreon


A cloud migration story from a car company

Once upon a time, a car company was trying to set up operations that let it read live data from all of its cars. Actually, basically all car companies were trying to do it because the data is valuable (selling user data is a lucrative business).

One quirk about the company is that it's split into different orgs that handle operations over different parts of the world and these orgs are constantly at war. So, each of these orgs has its own effort to do this and the orgs don't want shared code, data, etc.

For reasons beyond the scope of this post, the North American org ended up getting a mandate to handle the code for this, although the data was still to be kept separate, with ACLs so that different orgs couldn't "steal" data from each other. But there were a couple of major problems with this.

First, execs at a cloud company had sold the car company on a "data lake" solution with a fancy name. Unfortunately, the cloud company was also internally at war and there were two competing data lake solutions. One of the solutions could solve the problem the car company wanted, had proper ACLs, etc. The other was the shiny branded solution that the cloud company execs sold to the car company execs.

Because the car company execs were sold, orders rolled downhill and engineers were required to use the solution that couldn't work. That was the smaller of the two problems. The larger of the two problems was that the North American division of the car company had effectively laid off all of its software staff by requiring its engineers to move from Silicon Valley to a suburb of a red city in a red state that not many SV engineers were interested in moving to. The car company then hired expensive American consultants to solve this problem. The consultants suggested hiring more consultants, so the expensive American consultants drew diagrams of what the systems should look like and then, to save money, outsourced the implementation to cheaper Indian consultants.

Unfortunately for the car company, no part of the solution made sense. At the "boxes and arrows diagram" level, the solution was absurd even though the problem being solved was fairly straightforward. At the time, this was a prototype project that only needed to work for one model of new car sold and very little data was being recorded for each of those cars, so the total amount of data being written was tiny. But the total amount of data being written was still too large for the system to work.

[section below is likely to have some inaccuracies since it's from memory from conversations that happened quite a few years ago] The diagram had things like an input queue which connected directly to Kafka. When asked why Kafka was necessary, an engineer answered "you need a queue", seemingly not understanding that the thing feeding Kafka was a queue. The solution also used Spark for reasons that weren't obvious. When asked why Spark was used, it turned out to be because someone knew that you could connect Spark to Kafka in order to execute Java code on data that was in Kafka. The code necessary was some deserialization code and some compression, which was written in such a way that a very short deserialization function would run on incoming data, which would then get serialized by Spark so that it could be sent over the wire to another Spark node, which would then compress a tiny bit of deserialized data. This could've been run in one function on someone's laptop, but it instead turned into abig, slow, Spark cluster sitting behind a big Kafka cluster. The whole thing was too slow to work due to a combination of the convoluted architecture and unoptimized code.

Since this was a VIP customer for the cloud company, the cloud company had some of its application engineers sit down for a figurative weekend to whip up a prototype that worked. I guess you could call this a cloud success story in that this car company wasn't able to build a working solution and also wasn't able to hire a consulting company that could build a working solution, but by becoming a very important cloud customer, they were able to get a working solution that probably didn't cost 10000x times more than it needed to, which was basically fine given how cheaply the problem could've been solved and how much cars cost.


Related Creators