I’ve been a Software Engineer at Cxense for four and and half years now — and I’ve loved (almost) every minute of it! The list of everything I’ve learned so far would be very long, so instead I’d like to share a summary of the five best things about working in the R&D team at Cxense. Here it goes!
Engineers love tough problems and you’ll find plenty of them here. Over the years my colleagues and I have worked on thousands of dazzling problems within areas of distributed databases, data mining, machine learning and development operations, not to mention all the bugfixes. Fixing bugs is actually fun, because it feels good to trace that flaw by the end of the day and fix it. Nothing’s perfect, and that’s also why you need to write good unit and integration tests. Writing tests is hard and time consuming, but it always pays off in the long run. Code reviews provide a great opportunity to learn how other people think and tackle challenges, both for the reviewer and the one being reviewed.
When I met the guys from Cxense for the first time five years ago, they were going live with a large publisher network in Spain. I remember seeing a live chart with the event traffic instantly increasing by ten times or so. The guys were very excited about whether the system would handle the challenge — and it did. Since then, the numbers have increased by many orders of magnitude. We passed the 20 billion page views per month mark a two years ago. Counting all possible type of events we collect that would be above 60 billion events per month or more than one terabyte of data per day. Crawling is also approaching 110 million Web pages per month these days. These numbers are huge, but the transition impresses me even more because it is impossible to design a system for this kind of growth. A solution that is sufficient at one point has to be phased with a better solution at another point, which in its turn will become obsolete at a later point. Nevertheless, despite this tremendous growth we are still able to serve real-time data with latency counted in just milliseconds.
We program mainly in Java using a common code style, static checks and best practices. We use git and GitLab for our code and reviews and a homemade builder doing speculative merge builds and lots of awesomeness like stability tests and automatic deployment to the staging environment.
Nowadays we’re using Mesos on top of our own hardware and running things with Aurora, using gRPC for communication between services, Consul for service discovery, Prometheus for performance metrics and alerting and Grafana for monitoring dashboards. What I love most about this is that deployments across multiple data centers and hundreds of machines are still within minutes. In fact, they are only getting faster, even with the growth in scale mentioned in #2.
Each teams tasks come from four sources – feature roadmap, support issues, on-call incidents and technical roadmap. For most teams, we run two-week iterations, where tickets come from the development roadmap for the given quarter, pre-triaged support issues meeting the bar, or the technical roadmap issues. For features, we have a roadmap for a year ahead, which then drains down to specific plan for each quarter for each team. For support, we have several lines of highly trained professionals, which filter out all the noise and leave only actual issues to be fixed. We separated on-call incidents from the rest of the JIRA issues long time ago and for the last year or so we have also been working on growing a highly skilled infrastructure team. Finally, technical roadmap covers things like scalability improvements, shifting out components with better alternatives, or making the system more resilient.
Each iteration starts with a planning phase involving team leads and engineering managers, followed by a meeting where team members are free to pick the issues they want to get their hands on and with respect to the amount of work they can handle in the next sprint. The sprint ends with a highlight of the most important things a team has achieved and a demonstration of the most interesting features.
The most important ingredient in a great workplace is people. When joining Cxense I had no expectations other that here would be the best people in the industry. This turned out to be true, and I am lucky to be working with some of the best engineers and business experts in the field. It’s a pleasure to be a part of the discussions and changes we have done over the years and those to come.
Cxense is spread across 5 of 7 continents. It’s quite common that my day would start with a quick fix for a support issue from Tokyo, followed by a few quick answers to questions from my colleagues in Samara, Munich, Buenos Aires or San Francisco. The changes I made the day before would be reviewed by a colleague sitting next to me in Oslo and then distributed to the data centers across the world. After lunch I might focus on the next feature, working with the data from a major news publisher in USA, Argentina, Norway or Japan. On my way home, I sometimes comment on the global company chat and receive a funny cat picture from a colleague in Melbourne. The world is very small when it comes to Cxense.
Cxense has grown a lot over the years I‘ve been working here, but it remains a fast-paced company with talented people, exciting challenges and a cool software stack. Things are getting better and I can only imagine where we will be in the next five years. While working here I have learned a billion things about software development (and yet still learning), and I hope that this post was able to bring some of those insights to you.