Red-Green-Code

Deliberate practice techniques for software developers

  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter

CPFAQ: Collecting Quora Topics

By Duncan Smith Feb 14 0

Quora Related Topics

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

In recent weeks, I have been experimenting with ways to collect Quora questions, especially those that don’t appear in standard views like search engine results and the All Questions page. But I also want to make sure that the questions I collect from alternative sources are relevant, since I eventually need to manually evaluate the best questions for a FAQ. Last week I started filtering on the Competitive Programming topic tag, but I realized that this can filter out relevant questions. To see why that is, I’m investigating how Quora topics work.

Quora Topics

To categorize questions and help users find them, Quora allows questions to be tagged with one or more topics. When tagging questions, users can select existing topics or create new topics. A bot called the Quora Topic Bot also tags questions.

Each Quora topic is associated with a set of pages of the form www.quora.com/ topic /[TopicName]/ [PageType], where PageType can be:

  • (empty) or read: The topic home page, which contains a feed of questions related to that topic.
  • all_questions: A list of all questions tagged with that topic. For large topics, the page uses “infinite scroll,” and you may never actually see all of the questions.
  • followers: The Quora users who follow the topic.
  • log: The edits that have been made to the topic itself.
  • writers: The most viewed writers in the topic.
  • faq: Up to 10 frequently asked questions for the topic.
  • links: A new Quora feature that isn’t used much yet.
  • top_questions: A view intended to help users find questions to answer.

Most of these topic page types contains these common elements, which can be extracted from the page HTML using XPath:

  • The topic title, .//h1//span[contains(@class, 'TopicNameSpan')]
  • Statistics about the number of questions, followers, and edits, .//div[contains(@class, 'TopicPageStatsSection')]//strong
  • A list of related topics (which requires a more complex XPath expression to extract).

Some of the topic pages types also contain a list of questions. My eventual goal in analyzing the topic page is to collect and organize these questions.

Related Topics

Just as the question page has a list of related questions, the topic page has a list of related topics. And just as we can use related questions to expand a small question list into a larger list, we can expand a small topic list in the same way. Since topic pages contain question lists, this can lead to more questions, which can lead to more topics, and so on.

The problem with recursively collecting related questions and topics is that it’s easy to collect a large number of irrelevant questions. However, the number of Quora topics is small compared to the number of Quora questions. So I’m starting to manually curate relevant topics, which I’ll then use to filter my question list.

As a first step, I extracted all of the unique topics from last week’s list of 5900 questions, and manually filtered them. Out of about 7000 topics, I decided that only about 150 were relevant for this project. Here is some trivia about that list of topics:

  • Competitive Programming is by far the most popular topic in the list, with over 20k questions and over 290k followers.
  • The next most popular topics are TopCoder (>4k questions, >128k followers), CodeChef (>4k questions, >134k followers), and Algorithms in Competitive Programming (>3k questions, >1k followers).
  • There are a lot of creative ways to spell Competitive Programming, including Competitive Progarmming, Competive Programm, Compettive Programming, Compitative Programmin, and Compitative Programming (these are all actual topic names, which I’ll get around to merging soon).
  • There are even more correctly-spelled topics that are so close to the main topic that it’s debatable whether it really makes sense to maintain them as separate topics. For example, do we really need Coding Competition, Competitive Coding, and Programming Competitions, in addition to the main topic?

Next Steps

Starting with my initial list of 150 topics, I’ll recursively collect the set of unique related topics and see if they converge to a reasonable list. If they do, that will become my official list of topics for the Quora portion of this research project. If they don’t, I’ll stop after a few iterations and manually filter the list. Either way, I’ll start using that list to collect and filter questions.

Categories: CPFAQ

Prev
Next

Stay in the Know

I'm trying out the latest learning techniques on software development concepts, and writing about what works best. Sound interesting? Subscribe to my free newsletter to keep up to date. Learn More
Unsubscribing is easy, and I'll keep your email address private.

Getting Started

Are you new here? Check out my review posts for a tour of the archives:

  • Lessons from the 2020 LeetCode Monthly Challenges
  • 2019 in Review
  • Competitive Programming Frequently Asked Questions: 2018 In Review
  • What I Learned Working On Time Tortoise in 2017
  • 2016 in Review
  • 2015 in Review
  • 2015 Summer Review

Archives

Recent Posts

  • LeetCode 1022: Sum of Root To Leaf Binary Numbers January 27, 2021
  • LeetCode 1288: Remove Covered Intervals January 20, 2021
  • LeetCode 227: Basic Calculator II January 13, 2021
  • A Project for 2021 January 6, 2021
  • Lessons from the 2020 LeetCode Monthly Challenges December 30, 2020
  • Quora: Are Math Courses Useful for Competitive Programming? December 23, 2020
  • Quora: Are Take-Home Assignments a Good Interview Technique? December 17, 2020
  • Quora: Why Don’t Coding Interviews Test Job Skills? December 9, 2020
  • Quora: How Much Time Should it Take to Solve a LeetCode Hard Problem? December 2, 2020
  • Quora: Quantity vs. Quality on LeetCode November 25, 2020
Red-Green-Code
  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter
Copyright © 2021 Duncan Smith