Roman Elizarov noticed something different about the 2024 Advent of Code:
AI is reshaping competitive programming, not just in benchmarks or papers, but in real life. My rough guess: this year’s Advent of Code leaderboard features ~80% AI-driven entries in the top 100 for the first time. Advent of Code is still an amazing event to sharpen your software engineering skills and have fun. But in just a year, it has lost much of its relevance as a way to compare problem-solving skills among humans.
If a spot on the AoC leaderboard no longer says much about your skill at solving puzzles, does that predict a change in the way interviewers use coding puzzles to evaluate candidates for programming jobs?
Puzzle Bots
For online coding interview services like Codility, the answer is clear: If these screening tools don’t have good cheating detection, their results will become increasingly meaningless. The same is true for online programming contests like Codeforces, which will have to work to stay ahead of LLM users if they want their leaderboards to be relevant.
But let’s assume that companies solve the cheating problem for coding interviews, either through technical advancements or by just holding final interviews in person. So interviewers know that their candidates are solving the interview problems using their own human brains. That doesn’t change the fact that generative AI tools are getting increasingly good at solving coding puzzles. Should interviewers rethink the practice of evaluating humans using these types of puzzles?
One possible answer is that evaluating programmers using coding puzzles in the generative AI era is like evaluating accountants using arithmetic tests, when they’ll be using calculators, spreadsheets, and specialized software tools on the job. According to this argument, if an AI coding tool can reliably solve a problem, then that problem is no longer relevant for evaluating programmers, since even non-programmers could use AI tools to solve it on the job.
But as every working programmer knows, coding interviews already test skills that aren’t relevant for a programmer’s day-to-day work. Most programmers don’t need to implement interview favorites like binary search or topological sort. Low-level algorithms are already implemented in libraries, and most programmers spend their days at a higher level of abstraction.
The goal of a traditional coding interview is not to see what the candidate knows about binary search. Algorithms and data structures are just a common body of knowledge that all computer science students learn, and which interviewers have therefore decided to use for their problems. A good coding interview isn’t a computer science test. It’s a way to find out how a candidate thinks through a problem.
So it doesn’t matter that LLMs can solve coding interview problems. Professional programmers won’t be asking AI coding assistants to implement a binary search solution any more than they implemented binary search by hand before the proliferation of coding bots.
Levels of Abstraction
Current generative AI tools are useful for some programming tasks. Programmers may not be asking bots to solve interview problems, but these tools can reduce the tedium involved in writing code to call a web endpoint and process the results. As sophisticated as the newest tools are, however, an inexperienced programmer could easily generate a solution that works for one user but struggles to scale up to the target audience for a new service. This is why interviewers need to evaluate whether a candidate can translate product ideas into code, making the small and large technical decisions that distinguish a prototype from a commercial product.
We can think about this in terms of levels of abstraction. Given the right prompts, LLM bots can generate code at the function and class level. And they can discuss high-level design concepts. But programmers still need to put the pieces together and evaluate whether the generated output does what the bot claims it does.
But what if tools arrive that can bridge the gap between low-level code and high-level architecture, and can do this reliably? We could imagine a product manager uploading a functional specification to an AI bot, and the bot asking the clarifying questions required to transform the function requirements into a detailed technical specification. It would then generate code, instantiate cloud services, and deploy a working system. The bot could even ask pilot users to try the system out, collect feedback from them, and make improvements.
If bots prove that they’re able to do this reliably, the argument about coding interviews breaks down. We may reach the point where human programmers aren’t writing functions and classes, or even reviewing generated code. And the standard coding interview may finally become irrelevant.
The autonomous bot scenario seems much closer now than it did a few years ago. If it comes to fruition, the nature of a programming job may change drastically. But as with all technical innovations, the best way to prepare for this potential change is to adopt the new generative AI tools as they become available, testing their limits and using them for everything they can do.