Page 1 of 31

Magic Data

PostPosted: 07 Jul 2009, 09:30
by Arch
Data: http://dl.dropbox.com/u/2771470/mtg-data-2015-11-08.zip
Text: http://dl.dropbox.com/u/2771470/mtg-dat ... -11-08.zip

Source is available at:
Gatherer download/parsing: https://github.com/karmag/loa
Text version: https://github.com/karmag/loa-format
Support library: https://github.com/karmag/ants

This project has been discontinued. The last version of the software is compiled and ready to go here:
http://dl.dropbox.com/u/2771470/loa-standalone.zip

Original post:

I was wondering how/where the developers around here get their data. The data I'm taking about is card-information (names, p/t, oracle text..), what sets contains what cards and other such straight up data.

The way I currently do it (and suspect others are doing as well) is to parse whatever sources I may find that is decently structured. (Crystal Keep, Gatherer, MWS, the comprehensive rules to name some.) The problem here is that none of the data is really formatted for easy processing. The problems include duplication of data, missing data and plainly malformed data.

So does anyone have a better source for this kind of data?

Re: Magic Data

PostPosted: 07 Jul 2009, 10:22
by Snacko
Spider Gatherer for cards and organise the data as you feel is right. This is the place all the card information comes from. However look out for the new Gatherer bugs especially when parsing non English cards.

For rules the Crystal Keep rulings summaries are nearly always properly formatted and pretty much everyone uses them.

I only use GH MWS masterbase, because it contains promo cards which are missing from Gatherer. The other good side of this is that GH puts effort to ensure the cards data doesn't have Gatherer bugs plus he fixes all reported issues.

Re: Magic Data

PostPosted: 07 Jul 2009, 17:54
by MageKing17
The old gatherer used to spit out a perfectly suitable oracle spoiler, but for some reason that option is no longer there (or it wasn't last time I looked). As for determining what cards are in what sets, on both the old and new gatherers you can get it to spit out a "text spoiler", which is similar to an oracle spoiler, but contains extra information (but lacks line breaks in the rules text, and is therefore unsuitable as a replacement for the oracle spoiler (it makes a world of difference if a "whenever" is part of an ability or the start of a seperate one)), such as rarity and whatnot.

Re: Magic Data

PostPosted: 09 Jul 2009, 09:12
by Arch
Decided to implement some tools for extracting the data from Gatherer. I did encounter some wierd errors as mentioned by Snacko (mainly raritiy being wrong) but it seems to mostly affect older cards/sets.

I made the data available if anyone wants it here:
http://www.easy-share.com/1906676470/mtg-data.zip

It's XML-formatted and is organized as follows:
cardlist.xml All the rule information for all card, basicly all information that is not set-dependent. Un* sets are not included here.
glossary.xml, rules.xml The comprehensive rules in XML format.
setlist.xml All the sets with set-codes, release-data and such. Information taken from wikipedia/wizards web-site.
set/*.xml Set-specific information on a set-by-set basis. Contains rarity, collectors numbers, artist.

What is not included here, that I wanted to do, is flavor and printed rules/types. Since it doesn't really interest me I didn't spend any time trying to parse that (really annoying) part of gatherer.

I have not spent any time trying to correct information that might be wrong in gatherer. This has been straight up parsing. I've just made sure data ended up in the right place.

Will make an update when gatherer is updated with M10.

Re: Magic Data

PostPosted: 09 Jul 2009, 14:10
by staggerwingjtstw
Question Arch, is that the old Comp Rules, or the one they said they'd be releasing on like, Tuesday?

Re: Magic Data

PostPosted: 09 Jul 2009, 16:50
by Arch
staggerwingjtstw wrote:Question Arch, is that the old Comp Rules, or the one they said they'd be releasing on like, Tuesday?
Those are the old (090501) comp-rules. Seems like both gatherer and comp-rules will be updated tomorrow though so I'll do my update then as well.

Thanks for the heads-up. I'm not following the wizards homepage on a daily basis as I'm currently on vacation. In my mind the update was further down the line.

Re: Magic Data

PostPosted: 10 Jul 2009, 19:28
by Arch
Updated with M10 changes.

http://www.easy-share.com/1906697239/mt ... 090710.zip

If anyone would happen to be interested in the script behind this it can be made available.

Re: Magic Data

PostPosted: 11 Jul 2009, 05:41
by Marek14
For some reason, I can't unpack con.xml :(

EDIT: Also, what do you use to view the xml files properly? I tried the browser, but it still displays tags...

Re: Magic Data

PostPosted: 11 Jul 2009, 08:37
by Arch
Marek14 wrote:For some reason, I can't unpack con.xml :(
That sounds wierd. Is it only con.xml that's failing? Are you able to unpack the rest of the files without trouble?

Marek14 wrote:EDIT: Also, what do you use to view the xml files properly? I tried the browser, but it still displays tags...
Not sure what you mean by "properly". You usually want to see the tags when looking at XML. I myself use whatever text-editor I have available to view XML-files.

Side-note: This package is aimed at developers. If you're not a developer you probably have little to no use for this data.

Re: Magic Data

PostPosted: 12 Jul 2009, 05:25
by Marek14
Only con.xml. If it was everything, I'd think the file was corrupted, but it's only con.xml.

I am not a developer, but I am always on a hunt for good text file with all the info. I tried new Gatherer, but the best I was able to obtain is a gigantic html file, which can be browsed offline, but still isn't quite what I seek.

Re: Magic Data

PostPosted: 12 Jul 2009, 16:41
by Arch
http://blogs.msdn.com/oldnewthing/archi ... 55388.aspx

It appears that no file can be named con<whatever>. It's supposedly to support backwards compability with DOS 1.0. (Only affects microsoft-OSs of course.) I'm on Ubuntu so I didn't have any problems with it. It's something that would have to be adressed though...

I am not a developer, but I am always on a hunt for good text file with all the info. I tried new Gatherer, but the best I was able to obtain is a gigantic html file, which can be browsed offline, but still isn't quite what I seek.
One of the reasons I made this is because with a proper data format like this it's quite trivial to manipulate the data - including turning it into plain text. I could help you with it if you don't know how to do it yourself; just give me an idea of how you would like the text to be presented.

Re: Magic Data

PostPosted: 13 Jul 2009, 04:19
by Marek14
Well, I guess I would like the format of old Oracle spoiler, which looks like this:

Absolver Thrull
3W
Creature - Thrull Cleric
2/3
Haunt (When this card is put into a graveyard from play, remove it from the game haunting target creature.)
When Absolver Thrull comes into play or the creature it haunts is put into a graveyard, destroy target enchantment.

This is how it looks now:

Name: {Absolver Thrull}
Cost: 3W
Type: Creature — Thrull Cleric
Pow/Tgh: (2/3)
Rules Text: Haunt (When this card is put into a graveyard from the battlefield, exile it haunting target creature.)
When Absolver Thrull enters the battlefield or the creature it haunts is put into a graveyard, destroy target enchantment.
Set/Rarity: Guildpact Common

The Set/Rarity line would be a nice one to have (perhaps using shortcuts instead of full names), but not strictly necessary.

Re: Magic Data

PostPosted: 13 Jul 2009, 09:05
by Arch

Re: Magic Data

PostPosted: 16 Jul 2009, 18:01
by Marek14
Thanks, I got it and the data are nice :) The only change I had to do was to replace the dash in card text.

Re: Magic Data

PostPosted: 19 Jul 2009, 17:46
by Marek14
OK, I found a bug in the text file you posted here. It seems that planeswalker texts are completely off.

Ajani Goldmane
2WW
Planeswalker - Ajani
1/1
{T}: Target creature an opponent controls attacks you this turn if able.
LRW R, M10 M

(Interestingly, it looks like Alluring Siren itself is missing from the file).