CPFAQ Archives - Page 5 of 6

CPFAQ: A Question Database

By Duncan Smith Leave a Comment Mar 21 0

Database Schema

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

So far this year, I’ve been building tools that operate on text files in tab-separate value (TSV) format. The advantage of this format is that it’s easy to read from and write to in code, and it imports directly into Excel for manual processing. For example, last week I worked on a TSV file in which each line contains one Quora question title, link, and follower count. I extracted this information automatically using one of my tools. I then imported the TSV file into Excel so I could manually edit each row to add a canonical question and set of tags.

As I classify each question, I find it useful to review how previous questions are classified. That helps ensure that question classification is consistent. For example, I classified one question as follows:

Title: What are some good questions on CodeChef from which I will learn more algorithms?
Canonical title: What are some good competitive programming problems?
Tags: competitive-programming, specific-online-judge, competitive-programming-problems

I later came across the question What are some must-do problems on Codeforces?, which I thought should have the same classification. So I looked through the current list of classified questions to copy the classification decisions I made earlier.

If I only had a few questions classified, it would be easy enough to scan through the list and find a similar one. But as the list grows longer, that becomes impractical. So I decided this week that it’s time to upgrade my storage technology.

« Continue »

CPFAQ: Classifying Quora Questions

By Duncan Smith Leave a Comment Mar 14 0

Sorting

On Quora, it’s common to see the same questions, or variations of the same questions, show up repeatedly. The competitive programming topic is no exception to this rule. Merging similar questions is an option, but it presents a couple of challenges:

Quora Content Review (a human + bot system) has its own merging rules, and will unmerge questions that it thinks are insufficiently similar. For example, here’s a question with over 2000 followers and 60 answers: What basic data structures and algorithms should one learn before starting competitive programming? I tried to merge in a similar question, What are the data structures and algorithms used in competitive programming? (3 followers, 1 answer), but it was quickly unmerged by QCR. A FAQ could collect these similar questions, even when they couldn’t be merged on Quora.
It can be time-consuming to find the best merge target, especially given QCR rules. Quora’s merge suggestion list often doesn’t provide the best merge choice. A Google search works better, but for frequently-asked questions, it’s more efficient to search a FAQ than search all of Quora.

A competitive programming FAQ that is independent of Quora can resolve these problems. If QCR doesn’t want questions to be merged, they can still be listed under the same question in the FAQ. And the FAQ can categorize questions in a way that makes it easy to find appropriate merge targets.

Using some of the data collection from previous weeks, I have an initial list of categorized questions to use as a starting point for the FAQ categories.

« Continue »

CPFAQ: Classifying Quora Topics

By Duncan Smith Leave a Comment Mar 7 0

Competitive Programming Wiki

If you study the current Quora topic ontology for competitive programming, it’s clear that it needs some work. The six top-level topics don’t cover all of the subjects that people ask about. And many related topics aren’t even in the ontology, usually because people created them without selecting parent topics.

Here are two steps to create a better ontology: 1) Evaluate the related topics that already exist, and add each one to the appropriate place in the ontology, and 2) Write a clear description of what questions belong in each topic.

« Continue »

CPFAQ: Quora Topic Ontology

By Duncan Smith Leave a Comment Feb 28 0

Quora Topic Ontology

This week, I published a post to the Quora Topic Gnomery blog on the subject of competitive programming topic cleanup. It generated some discussion about how competitive programming topics on Quora should be organized, and prompted me to look into the Quora Topic Ontology.

« Continue »

CPFAQ: Quora Topic Cleanup

By Duncan Smith Leave a Comment Feb 21 0

Gnome

As part of the FAQ research process, I’m creating a canonical list of Quora topics related to competitive programming. Like many things on Quora, topics in this area are a bit messy. Or as the Quora Topic Gnomes say:

Quora’s Topics are a free-for-all, and that often creates greatness, but it really requires crowdsourced curation to make it all it can be.

« Continue »

CPFAQ: Collecting Quora Topics

By Duncan Smith Leave a Comment Feb 14 0

Quora Related Topics

In recent weeks, I have been experimenting with ways to collect Quora questions, especially those that don’t appear in standard views like search engine results and the All Questions page. But I also want to make sure that the questions I collect from alternative sources are relevant, since I eventually need to manually evaluate the best questions for a FAQ. Last week I started filtering on the Competitive Programming topic tag, but I realized that this can filter out relevant questions. To see why that is, I’m investigating how Quora topics work.

« Continue »

CPFAQ: Collecting Quora Questions, Part 2

By Duncan Smith Leave a Comment Feb 7 0

Quora tags

I’m building a webliography of competitive programming resources, and I’m currently focusing on Quora questions. So far, I have extracted questions from search engine results and from the All Questions page that Quora generates. But as I mentioned last week, only a small fraction of the available topic questions appear in those locations. Where are the rest?

« Continue »

CPFAQ: Collecting Quora Questions

By Duncan Smith Leave a Comment Jan 31 0

Competitive Programming on Quora

Last week, I wrote about using search engine results to collect links to import into Webliographer. I used three search engine features: standard search, standard search with duplicates included, and site-specific search. Each of these features had pros and cons: Standard search returned results from more domains, but with fewer results per domain. Standard search with duplicates included reduced the number of domains, but returned more results from some domains. And site-specific search returned many results from a single domain, but not all of the results from that domain. All three techniques enforced a seemingly arbitrary limit of several hundred results. This week, I’m going to use a different technique for getting results from a single site.

« Continue »

CPFAQ: Importing Search Results

By Duncan Smith Leave a Comment Jan 24 0

The purpose of Webliographer is to collect and manage web references (URLs). A good way to get a baseline set of references on a topic is to import the results of a web search. But if you use a search engine this way, you’ll find some quirks that don’t appear when you’re searching interactively.

« Continue »

CPFAQ: Initial Commit: Webliographer

By Duncan Smith Leave a Comment Jan 17 0

Bibliography

This week, I made my initial commit to the GitHub repository I’ll be using for my Webliographer project.

« Continue »