CPFAQ: Patterns in Question Titles

Question Titles

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Now that I have a database of competitive programming questions, I thought it would be interesting to look for patterns in the way that questions are asked. This will be useful in writing canonical question titles, which will result in the set of questions included in the FAQ.

I briefly considered loading the list of questions into a text mining program, and even tried out a free online tool. But ultimately I decided that was overkill, so I wrote a simple program to find the words that are most frequently used to start the questions in my list.

My question set has about 16.5k questions, gathered from the all_questions pages of topics related to competitive programming. It’s not the complete set of CP questions on Quora, since Quora makes it hard to gather a complete set of anything. But I suspect that it’s a significant fraction of the complete set.

Here are the words that most frequently start question titles.

How (27% of set)

Over half of the questions in the set start with one of two words, How and What, and over half of those start with How.

Competitive programming is a skill to be learned, and people want to know how to learn it. For example, How do I learn competitive programming as a beginner? is one of the most followed questions in the set. Other popular questions ask how best to learn algorithms and data structures, dynamic programming, and math.

How is also used to ask how a specific person or group (Gennady Korotkevich, Anudeep Nekkanti, Russians) achieved competitive programming greatness.

And How also starts the famous Quora competitive programming question How is competitive programming different from real-life programming?

What (25% of set)

The other half of the dominant word duo is What.

Some of the What questions are just reworded How questions, like the #1 most-followed question in the set, What is the best strategy to improve my skills in competitive programming in C++ in 2-3 months? or What can I do to get better at algorithms?

Others are straightforward requests for information on topics like algorithms and data structures, courses, and books.

And a few popular what questions are looking for personal experiences about coding with Gennady Korotkevich, working with Petr Mitrichev, or attending the Facebook Hacker Cup.

Is (7% of set)

A question that starts with Is could be answered with a simple Yes or No. Although Quora users usually provide a longer answer, most Is questions would be more clear if they started with How or What.

For example, here’s a popular Is question: Is there a stepwise explanation (beginner to advance) to solve the dynamic programming problem using bitmasking? It would be clearer to ask, “How can I solve dynamic programming problems using bit masking?”

Is questions are also used to ask the ever-popular time-based Quora questions. For example:

Although the value of these questions is debatable, rephrasing them as How long should I spend questions would at least provide better canonical merge targets.

In some cases, an Is question is the best choice. For example: Is it good practice to use #include <bits/stdc++.h> in programming contests instead of listing a lot of includes?. It’s certainly better than the Stack Overflow equivalent, Why should I not #include <bits/stdc++.h>?

Why (6% of set)

Why questions tend to be less practical than How or What questions. People often ask Why questions to satisfy their curiosity, not to get advice that they’re going to take action on. Here’s one popular Why question in that category: Why are programmers in the software engineering job interviews tested on skills similar to a Topcoder contest irrespective of the fact that the skills required in the industry are entirely different?

I (6% of set)

Starting a question with I is just a way to provide context before asking the real question. I questions contain one or more declarative sentences, followed by a question. The most popular example by far: I am planning to quit my job and study algorithms full-time for one year. My target is to train my algorithms skill for preparing a Google interview. Can anyone give me some advice?

Which, Can, Where, etc. (28% of set)

The remaining 28% of the questions in the set start with other words: Which (3%), Can (3%), Where (2%), Are (2%), Who (2%), Do (3%), and over 200 others that each represent less than 1% of the set.

Who is a special case, since it’s the most direct way to ask about specific people (e.g., Who are the top competitive programmers who work for Apple?)

But many of the questions in this section are just re-worded variations on popular themes: