Hello, its time for the second entry discussing progress on the RFG rebuild. I know its been a little quiet on the front page in this regard but work is progressing. This time I'm mostly going to focus on the database work that is ongoing.
As I mentioned in the previous dev log, we have a large amount of data that we are planning to transform into a new structure to allow us more flexibility in adding game data. However, this means we need to take all the existing data and move it to a new database. This isn't a simple task since we are not performing a one-to-one migration. A lot of the data needs to be cleaned up and reformed into the new structure. The first hurdle is fixing the developers and publishers fields. If you've ever added an entry to the site, you know that those fields are free form text - meaning you can put any text you want in there. The problem that has is we have a lot of duplicate entries, spelling errors, and different styles people have used. Take, for instance, a game with two developers (A and B) - I've seen all of the following different styles in the DB:
- A / B
- A & B
- A and B
- A with B
- A, B
- A; B
- A (B)
- A (for PC) / B (for Mac)
Not to mention duplicate entries (not including typos) for different company designations
- A, Ltd.
- A Ltd
- A, Ltd
- A Ltd.
- A Limited
This makes it very hard to migrate into a "Company" table to normalize and very hard to manually go through the data to "fix" it. We have nearly 15000 different unique entries just for companies. I am slowly going through and fixing these to populate a new table that we can tie games and hardware to. We also want to pull out all of the single person names to populate a "person" table that can be used for game credits as well as developer/publisher fields. The good news is once its done, we won't have to worry about duplicates or typos going forward. When submitting a game, you'll just need to select the company from the existing values (don't worry there will be a way to add companies as well if needed). Further, since its hard to verify that there still won't be duplicates after this, I'm planning on adding a "merge companies" function for site admins in case there were mistakes in the migration.
Once the companies are migrated, there will be a few smaller tables that will need the same treatment such as controller and genre fields. Then a more programmatic approach can be taken to migrate all the actual games and hardware and tie them to new entries for all the mentioned fields - standardizing all the data. This data work will also allow us to start building concrete APIs and UI mockups to move forward which will be much more tangible for most people reading this.
Anyway that's all for this one, I hope it was a fun read. Once more progress is made, I imagine entry 3 will discuss some more of the data structure and API work that will need to happen to move forward.