Red-Green-Code

Deliberate practice techniques for software developers

  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter

CPFAQ: Classifying Quora Questions

By Duncan Smith Mar 14 0

Sorting

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

On Quora, it’s common to see the same questions, or variations of the same questions, show up repeatedly. The competitive programming topic is no exception to this rule. Merging similar questions is an option, but it presents a couple of challenges:

  • Quora Content Review (a human + bot system) has its own merging rules, and will unmerge questions that it thinks are insufficiently similar. For example, here’s a question with over 2000 followers and 60 answers: What basic data structures and algorithms should one learn before starting competitive programming? I tried to merge in a similar question, What are the data structures and algorithms used in competitive programming? (3 followers, 1 answer), but it was quickly unmerged by QCR. A FAQ could collect these similar questions, even when they couldn’t be merged on Quora.

  • It can be time-consuming to find the best merge target, especially given QCR rules. Quora’s merge suggestion list often doesn’t provide the best merge choice. A Google search works better, but for frequently-asked questions, it’s more efficient to search a FAQ than search all of Quora.

A competitive programming FAQ that is independent of Quora can resolve these problems. If QCR doesn’t want questions to be merged, they can still be listed under the same question in the FAQ. And the FAQ can categorize questions in a way that makes it easy to find appropriate merge targets.

Using some of the data collection from previous weeks, I have an initial list of categorized questions to use as a starting point for the FAQ categories.

Collecting Questions

In past weeks, I have collected a set of Quora topics related to competitive programming. I also have a tool that can extract the list of questions from a topic page. This week, I extended the tool to accept a list of topic pages and collect the complete set of questions from all of the topic pages. That allows me to get all of the questions from my topic list into one master question list.

One of the useful statistics on the all_questions page is a follower count for each question. Using XPath, this number can be extracted from the FollowSecondaryActionItem div. The advantage of follower count is that it indicates interest in a question, even if the question has few or no answers. My master question list currently contains over 17,000 questions. It would take a long time to categorize all of those manually. So it’s useful to sort by the number of followers, and start with questions that have hundreds or thousands of people interested in them, rather than the long tail of questions that are only interesting to a few people.

Question Categories

I’m taking two complementary approaches to reducing the thousands of related questions to a manageable list of FAQs: canonical question titles, and question tagging.

Canonical question titles

Popular questions are asked in different ways using slightly different wording. But the answers to these questions mostly ignore the subtle differences in question wording, and focus on a few key ideas. This is an argument for collecting answers under a canonical question title. Quora supports the idea of canonical questions, and they’re trying various techniques to make questions more canonical. But they’re doing it at the scale of hundreds of thousands of topics, and I’m organizing less than 200. So I think I can come up with better canonical titles than the various Quora content control processes, or generalist content gnomes.

Here’s the canonical wording I’m currently using for some popular questions. Each one is linked to one of the Quora questions that would be listed under that canonical title:

  • How do I get better at competitive programming?
  • How do I learn competitive programming as a beginner?
  • What algorithms and data structures do I need to know for competitive programming?
  • How did [person] become a top competitive programmer?

Question tagging

Another way to organize questions is to tag them. On Quora, tags are called topics, and I have been using them to collect questions. But as with Quora’s version of canonical questions, Quora’s topic ontology also has to work for millions of questions on every conceivable subject. An ontology designed specifically for competitive programming doesn’t have that requirement.

As I found a couple weeks ago in my discussion with the topic gnomes, Quora’s competitive programming topic ontology needs some work. For now, I’m going to work on tagging questions independently of Quora, and then see how the result can be merged into the Quora ontology.

Here are a few tags to start with:

  • competitive-programming: Most questions in the FAQ will be tagged with this one, but perhaps not all. For example, a question like What books should I read to learn about algorithms and data structures? might not have that tag.
  • training: For questions about techniques to help practice competitive programming. There’s an equivalent Quora topic called Training for Competitive Programming. My tag ontology can be more concise, since the overall topic is a given and doesn’t need to be repeated in each tag. In other words, the competitive programming context is assumed for each tag.
  • algorithms-and-data-structures: I think it’s best to combine these two in a single tag, since in practice there’s not much point in having a data structure with no algorithm to operate on it, or an algorithm with no data structure for storing results.
  • online-judge: For questions about competitive programming competition sites in general.
  • specific-online-judge: For questions about specific competitive programming competition sites. I borrowed this topic organization from Quora, which has topics like Specific Competitive Programming Competitions and Specific SPOJ Problems (which currently just contains other topics).

I think the combination of question tagging and canonical questions will help describe the complete set of questions that people ask about competitive programming.

(Image credit: Drew Stephens)

Categories: CPFAQ

Prev
Next

Stay in the Know

I'm trying out the latest learning techniques on software development concepts, and writing about what works best. Sound interesting? Subscribe to my free newsletter to keep up to date. Learn More
Unsubscribing is easy, and I'll keep your email address private.

Getting Started

Are you new here? Check out my review posts for a tour of the archives:

  • 2023 in Review: 50 LeetCode Tips
  • 2022 in Review: Content Bots
  • 2021 in Review: Thoughts on Solving Programming Puzzles
  • Lessons from the 2020 LeetCode Monthly Challenges
  • 2019 in Review
  • Competitive Programming Frequently Asked Questions: 2018 In Review
  • What I Learned Working On Time Tortoise in 2017
  • 2016 in Review
  • 2015 in Review
  • 2015 Summer Review

Archives

Recent Posts

  • Do Coding Bots Mean the End of Coding Interviews? December 31, 2024
  • Another Project for 2024 May 8, 2024
  • Dynamic Programming Wrap-Up May 1, 2024
  • LeetCode 91: Decode Ways April 24, 2024
  • LeetCode 70: Climbing Stairs April 17, 2024
  • LeetCode 221: Maximal Square April 10, 2024
  • Using Dynamic Programming for Maximum Product Subarray April 3, 2024
  • LeetCode 62: Unique Paths March 27, 2024
  • LeetCode 416: Partition Equal Subset Sum March 20, 2024
  • LeetCode 1143: Longest Common Subsequence March 13, 2024
Red-Green-Code
  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter
Copyright © 2025 Duncan Smith