Red-Green-Code

Deliberate practice techniques for software developers

  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter

CPFAQ: Fast Classification

By Duncan Smith Aug 22 0

QuoraClassifier

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

Over the past few months, I’ve been classifying a list of Quora questions. Each question gets a primary category, the topic it is most relevant to. Classification helps organize the FAQ and also allows me to find duplicates to merge on Quora.

Until now, I’ve been classifying questions using a spreadsheet containing one question per row and a column for the primary category. But now that I’m familiar with the types of questions people ask, it only takes a few seconds to decide on a category, so the spreadsheet approach is slowing me down. This week, I wrote a simple classification program, QuoraClassifier, to optimize this process.

QuoraClassifier

The goal of QuoraClassifier is to make it as fast as possible to record a single decision: the primary category for a question. Recording the decision shouldn’t take much longer than deciding on the category.

QuoraClassifier is a simple WPF app. The UI language for WPF programs is XAML, which I used in last year’s project.

Like most of the tools I’ve built for the CPFAQ project, QuoraClassifier reads and writes tab-separated value (TSV) files. When it starts up, it reads a list of question data (titles, URLs, categories, etc.), which it keeps in memory. To save the classification results, it writes a TSV file in the same format. Since this is a quick and dirty tool, not a thoroughly tested application, I’m using two separate files to reduce the risk of data loss. Rather than overwriting the original file, I use two files that I can compare to verify that the changes are what I expect. If everything looks good, I can copy the output file onto the input file before the next classification session.

Once the input data loads, QuoraClassifier displays its single screen. The purpose of this UI is to 1) Display information about a question, and 2) Accept a classification choice. For maximum efficiency, the program lets me select a category using a single keystroke, at which point it displays the next question.

As shown in the screenshot at the top of this post, the UI includes the following fields:

  • Previous category: The category assigned to the previous question. Since the program moves to the next question as soon as it receives the classification choice for the current question, this field provides a way for me to check that I classified the previous question was the way I wanted.
  • Question title: The question title (from Quora).
  • Question URL: The Quora URL (not clickable).
  • View in Browser: A clickable version of the URL. This is a quick way to open the Quora question page, for cases when the title doesn’t provide enough information to classify the question.
  • Primary category: The primary category classification for the current question. Blank if it still needs classification.
  • Canonical title: A combo box containing a list of canonical titles that I have previously assigned to questions in this category, sorted in descending order by how often I used them. I haven’t implemented this yet, but I plan to use it to assign canonical titles once I’m done assigning categories.
  • Statistics: Total number of questions; number of questions with primary categories assigned; number of questions with canonical titles.
  • Last saved timestamp: When the output file was last written.

Instant Classification

To get through the questions in my list, I need to optimize the classification process. There’s a lower limit to how fast I can read a question title and decide how to classify it. For questions related to competitive programming, I’m probably already at the lower limit. So the only way to speed up the process is to optimize the act of recording a classification.

I decided that the fastest way to classify a question is using a single alphabetic key on the keyboard. Clicking with the mouse or tapping the screen is intuitive for many applications, but for speed, it’s hard to beat a keyboard interface. I assigned a unique letter to each category name. Since multiple categories start with the same letter, I couldn’t always use the first letter of the category. But it doesn’t take long to learn the letter associations: For example, P for People and R for PRoblems. The alternative, like pressing P once for People and twice for Problems, would just slow things down.

Here’s the category list:

  • Algorithms and data structures for competitive programming
  • Getting Better at competitive programming
  • Programming Contests
  • Exclude this question from the list (for off-topic questions)
  • General questions about competitive programming
  • MatH for competitive programming
  • Interviews and jobs in programming fields
  • BooKs about competitive programming
  • Programming Languages for competitive programming
  • Time Management for competitive programming practice
  • OrganizatioNs that competitive programmers join
  • Online judges
  • People who practice competitive programming (a.k.a. Competitive Programmers)
  • Competitive programming PRoblems (programming puzzles)
  • Getting Started with competitive programming
  • Tools for competitive programming
  • CoUrses for learning competitive programming
  • Competitive programming Vs. other types of programming (professional, hobbyist, academic, etc.)
  • Websites for studying competitive programming (other than online judge sites)
  • Y: Competitive programming coaches

And a few more keyboard commands:

  • Ctrl-F (Find): Navigate to the next uncategorized question.
  • Ctrl-S (Save): Write the output TSV.
  • Right/Left Arrow: Navigate to the next/previous question without changing a classification.
  • Spacebar: Navigate to the next question without changing a classification.

With these classification commands, I can classify a question in an average of 6.7 seconds. That’s about 30 hours for the ~16,000 questions in my list, or less than a month of calendar time at an hour or two per day.

Categories: CPFAQ

Prev
Next

Stay in the Know

I'm trying out the latest learning techniques on software development concepts, and writing about what works best. Sound interesting? Subscribe to my free newsletter to keep up to date. Learn More
Unsubscribing is easy, and I'll keep your email address private.

Getting Started

Are you new here? Check out my review posts for a tour of the archives:

  • Lessons from the 2020 LeetCode Monthly Challenges
  • 2019 in Review
  • Competitive Programming Frequently Asked Questions: 2018 In Review
  • What I Learned Working On Time Tortoise in 2017
  • 2016 in Review
  • 2015 in Review
  • 2015 Summer Review

Archives

Recent Posts

  • Quora: How to Read Cracking the Coding Interview April 21, 2021
  • Quora: What to Do When You’re Stuck on a Competitive Programming Problem April 14, 2021
  • Quora: How to Get Better at Competitive Programming April 7, 2021
  • Quora: Is LeetCode Useful for Beginning Competitive Programmers? March 31, 2021
  • How to LeetCode March 24, 2021
  • LeetCode 322: Coin Change March 17, 2021
  • LeetCode 152: Maximum Product Subarray March 10, 2021
  • LeetCode 856: Score of Parentheses March 3, 2021
  • LeetCode 11: Container With Most Water February 24, 2021
  • LeetCode 47: Permutations II February 17, 2021
Red-Green-Code
  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter
Copyright © 2021 Duncan Smith