Red-Green-Code

Deliberate practice techniques for software developers

  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter

CPFAQ: A Question Database

By Duncan Smith Mar 21 0

Database Schema

I’m working on a project this year to build a competitive programming FAQ. This is one in a series of articles describing the research, writing, and tool creation process. To read the whole series, see my CPFAQ category page.

So far this year, I’ve been building tools that operate on text files in tab-separate value (TSV) format. The advantage of this format is that it’s easy to read from and write to in code, and it imports directly into Excel for manual processing. For example, last week I worked on a TSV file in which each line contains one Quora question title, link, and follower count. I extracted this information automatically using one of my tools. I then imported the TSV file into Excel so I could manually edit each row to add a canonical question and set of tags.

As I classify each question, I find it useful to review how previous questions are classified. That helps ensure that question classification is consistent. For example, I classified one question as follows:

  • Title: What are some good questions on CodeChef from which I will learn more algorithms?
  • Canonical title: What are some good competitive programming problems?
  • Tags: competitive-programming, specific-online-judge, competitive-programming-problems

I later came across the question What are some must-do problems on Codeforces?, which I thought should have the same classification. So I looked through the current list of classified questions to copy the classification decisions I made earlier.

If I only had a few questions classified, it would be easy enough to scan through the list and find a similar one. But as the list grows longer, that becomes impractical. So I decided this week that it’s time to upgrade my storage technology.

SQL Server

For my Time Tortoise project last year, I used Entity Framework Core and SQLite for data access and storage. This was appropriate for a UWP app where my goals were code quality and maintainability.

But the goal of Webliographer is to produce an artifact (the FAQ), so I’m putting less emphasis on the design purity of the tool itself. With that in mind, I’m choosing a different data access and storage mechanism.

Here’s the strategy:

  • Design a simple data model that is reasonably normalized (avoids data duplication), while allowing new tables and columns to be added as needed.
  • Write SELECT queries to extract data in a way that makes it easy to export to TSV and edit in Excel.
  • Update my C# web data extraction tools to take as input the TSV files that they currently process, and INSERT or UPDATE the database tables.

The goal is to avoid spending too much time fiddling with UI or data access code, and concentrate on getting the data into a useful format for producing a FAQ.

For my database, I decided to use SQL Server 2017. It has a full-featured Developer Edition that is free for non-production use, and the SQL Server Management Studio tool makes it easy to experiment with ad-hoc queries. Not surprisingly, data access from a C# app is also simple.

Data Model

The initial version of the data model has the following tables, based on the classification process from last week:

  • Tag stores a name and description for each question tag.
  • Question stores the text, URL, and number of followers for each question.
  • CanonicalQuestion stores the question text for each canonical question.

The tables are related as follows:

  • Many tags to many questions: A question has one or more tags. Each tag is associated with one or more questions.
  • Many question to one canonical question: A question is associated with exactly one canonical question. Each canonical question groups together one or more questions.

Using this initial data model, I’ll import a set of questions and start building a library of queries to help classify them.

Categories: CPFAQ

Prev
Next

Stay in the Know

I'm trying out the latest learning techniques on software development concepts, and writing about what works best. Sound interesting? Subscribe to my free newsletter to keep up to date. Learn More
Unsubscribing is easy, and I'll keep your email address private.

Getting Started

Are you new here? Check out my review posts for a tour of the archives:

  • Lessons from the 2020 LeetCode Monthly Challenges
  • 2019 in Review
  • Competitive Programming Frequently Asked Questions: 2018 In Review
  • What I Learned Working On Time Tortoise in 2017
  • 2016 in Review
  • 2015 in Review
  • 2015 Summer Review

Archives

Recent Posts

  • LeetCode 1022: Sum of Root To Leaf Binary Numbers January 27, 2021
  • LeetCode 1288: Remove Covered Intervals January 20, 2021
  • LeetCode 227: Basic Calculator II January 13, 2021
  • A Project for 2021 January 6, 2021
  • Lessons from the 2020 LeetCode Monthly Challenges December 30, 2020
  • Quora: Are Math Courses Useful for Competitive Programming? December 23, 2020
  • Quora: Are Take-Home Assignments a Good Interview Technique? December 17, 2020
  • Quora: Why Don’t Coding Interviews Test Job Skills? December 9, 2020
  • Quora: How Much Time Should it Take to Solve a LeetCode Hard Problem? December 2, 2020
  • Quora: Quantity vs. Quality on LeetCode November 25, 2020
Red-Green-Code
  • Home
  • About
  • Contact
  • Project 462
  • CP FAQ
  • Newsletter
Copyright © 2021 Duncan Smith