Gatherer Downloader App (v1.6.8) [hq pics / card database]
Posted: 20 Oct 2010, 03:49
Current version is v1.6.8 - http://ark42.com/mtg/gathererdownloader ... -1.6.8.zip
I added a link to it on my main MTG page at http://ark42.com/mtg/ which is the best place to go if the version listed here is deleted and replaced with a newer version.
Basically, it can download card info from Wizards and save a "database" of card info in lots of formats: XML, CSV, C++, Java, PHP, JSON, SQLite. The app is written in MFC and compiled for Unicode, so it will require Windows 2000/XP/Vista/7, but everything is statically linked, so there are no DLLs required, and it does not need you to have Java or .NET installed. It can download any foreign language text as you select, and different save formats will save it as UTF-8 or UTF-16LE as appropriate.
English databases in XML format, for all cards from all sets (up to Dragon's Maze) with all card information, are now available at: http://ark42.com/mtg/gathererdownloader/MTGCardInfo/
You can load this XML into the program and save it to any of the other formats if you need.
It can also download card images and automatically crop the rounded border off (Fulls) and the art out (Crops) and save the JPGs for all the cards you want, in any language. Images can be saved by oracle ID or by card name in folders by set name or set code. There are defaults included to download the files into the formats needed for Magic Workstation, Magic Album, and Wagic. The CCGHQ high quality images that I host at ark42.com can also be downloaded instead of the regular images from Gatherer, but these HQ images are only available in English.
There is also a feature to download the set symbols and mana symbols images as well.
Click for screenshots:
More detailed information about the program (available under Help/Information) is also available here:
I added a link to it on my main MTG page at http://ark42.com/mtg/ which is the best place to go if the version listed here is deleted and replaced with a newer version.
Basically, it can download card info from Wizards and save a "database" of card info in lots of formats: XML, CSV, C++, Java, PHP, JSON, SQLite. The app is written in MFC and compiled for Unicode, so it will require Windows 2000/XP/Vista/7, but everything is statically linked, so there are no DLLs required, and it does not need you to have Java or .NET installed. It can download any foreign language text as you select, and different save formats will save it as UTF-8 or UTF-16LE as appropriate.
English databases in XML format, for all cards from all sets (up to Dragon's Maze) with all card information, are now available at: http://ark42.com/mtg/gathererdownloader/MTGCardInfo/
You can load this XML into the program and save it to any of the other formats if you need.
It can also download card images and automatically crop the rounded border off (Fulls) and the art out (Crops) and save the JPGs for all the cards you want, in any language. Images can be saved by oracle ID or by card name in folders by set name or set code. There are defaults included to download the files into the formats needed for Magic Workstation, Magic Album, and Wagic. The CCGHQ high quality images that I host at ark42.com can also be downloaded instead of the regular images from Gatherer, but these HQ images are only available in English.
There is also a feature to download the set symbols and mana symbols images as well.
Click for screenshots:
More detailed information about the program (available under Help/Information) is also available here:
[Card Databases]
A card database is just a list of the card information you choose to download from Wizards. You can save the information from the report grid to a variety of formats such as XML, CSV, CPP, Java, PHP, JSON, and SQLite. Only XML and CSV can be loaded back into the program.
The Append Card Database option functions similar to Open Card Database, except that the file loaded will be appended onto the current report grid. With the regular Open Card Database option, the grid is cleared first so that only the opened file is displayed.
[Excel]
There are some important things to note about Excel and CSV files. First, double-clicking on a *.csv file and causing it to open in Excel will not do the same thing as selecting the file from File / Open under the menu. If you use File / Open or select the file from the recently opened files under the File menu, it will be the same as if you used the Text Import Wizard (available in Excel 2002/2003 under Data / Import External Data / Import Data, and Excel 2007 at Data / Get External Data / From Text). The Text Import Wizard will not import files correctly if they contain multiple lines of text in a single field. Only double-clicking on the *.csv file, or dragging the *.csv file from Explorer to Excel will open it in a way that preserves the multiline fields such as the card rules column.
Excel will also always assume that comma separated value files are in ANSI format, even though they may contain UTF-8 characters for foreign languages. The Downloader will create *.csv files with a UTF-8 BOM that marks it as a UTF-8 file, but Excel will ignore this, so regular *.csv files will only be of much use for English data only. Also, Excel 2002 and older will display the BOM as 3 bytes of garbage in the top left cell, but you can easily backspace over that. When you save a *.csv file from Excel, it will also be written with ANSI characters only, so again, it is only useful for English.
Excel can open UTF-16LE files and will correctly show text in all languages, however, it will never separate fields by the commas. All of the data would appear bunched in the first column. For whatever reason, Excel will work with tabs between fields. This type of file should be called TSV and saved as a *.tsv format, but then you couldn't double-click on it to open it properly, so there is a file format you can select to save as "Excel CSV". This will create a *.csv file that uses tabs instead of commas, and will open in Excel wonderfully with text in all languages. You can save this type of file from Excel by selecting "Unicode Text (*.txt)" from the file type dropdown in the Save As window. Excel will default to saving the file with a *.txt extension, but again, *.tsv would be more appropriate. You can load TSV files (even if they have *.csv or *.txt extensions) back into Downloader if you wish.
[Edit Set List]
The set list is a CSV formatted table of information for each set. List sets here one per line. The 1st field is a category name, which can be used when saving images, to group various sets into different subfolders. The 2nd field must match the set code used on Gatherer, and the 3rd field must match the full set name used on Gatherer. The 4th field is optional, and if it is present, it is an alternate set code to use when creating subfolders for images. Similarly, the 5th field is also optional. It can be used as an alternate set name for subfolders, if the option to use them is selected in the Image Formats window.
This list will populate the set dropdown used to download card databases, as well as be used for the set symbol download option.
[Edit Symbol List]
Very similar to the set list, except there are only two fields, and the 2nd field is optional. This list is used for the mana symbol download option and for translating the mana symbols from card's printed text field. The 2nd field, if present, will be the name used to create images.
[Downloading Cards]
You must download or open a card database before you can download card images. One card image for each card listed will be downloaded, and one or more files will be created on your computer for each card, depending on the options selected. Card images are saved directly to files as they are downloaded, while the card database is only downloaded into the report grid in the program. You can save the card list in the report grid to a variety of file formats, but only XML and CSV can be loaded back into the program later.
Wizards does not have a method to download all of the card information in one easy pass, so certain combinations of options will result in many small downloads to retrieve the information you select.
All downloads start out by getting most of the information from the text spoiler for each set. Unfortunately, this information is only available from Wizards in English, so if any other language is selected, the downloader must check each card one by one to get the card ID for the other language.
After starting a download, changing the options will not take effect unless you cancel and start a new download. Certain options such as Save and Open are greyed out and unavailable until the download completes.
If something happens in the middle of a very long download of multiple sets, you should be able to resume by starting at the set that was partially downloaded. No duplicate cards will be added, as the ID field must be unique for each entry. Even for downloads of foreign languages where half the set was translated before the download stopped, this will still work. The set will be downloaded from the beginning in English, then as cards are translated, the duplicate entries will be removed once the ID is translated.
[Additional Card Information]
This option will download the printed name, printed type, printed rules text, flavor text, watermark, card collector number, artist name, and card rulings. This requires a download for each card one by one, since this information is not available on the spoiler pages. This may take additional time, and is not needed if you only wish to download the card images.
For languages other than English, the printed name, printed type, printed rules text, and flavor text are downloaded in the language selected. The regular name, type, and oracle rules columns are always in English, because that is how the official Wizards rules work for the cards. Without this option selected, downloads of other languages will only have the card ID field translated. If you are only downloading the card images, even for other languages, you do not need to check this option, as it will only take additional download time.
If the language is set to English, and this option is not selected, card IDs for cards with multiple art variations will be downloaded from the compact spoiler download. This is much faster than the methods required for other option combinations, because all the information for a set is available on one page. Unfortunately, the compact spoiler, just like the text spoiler, is only available from Wizards in English.
[Image Sources]
By default, all images are downloaded from Gatherer, just as all card information is. You can select alternate source to download card images from, which have various trade-offs.
The images on ark42.com are much high quality, but are only available in English and only available pre-cropped to the "Fulls" format with only a partial border around the card.
[Image Formats]
You may select any combination of the three image formats, and a folder will be created for each format.
The "Originals" are the card images exactly as they appear on wizards.com: 223x310 pixels (for most regular cards) with a full border and rounded corners.
The "Fulls" are cropped so that 6 pixels are removed from each edge, which is about half of the black border and removes the rounded corners.
The "Crops" are cropped to just the art in the top: 183x133 pixels.
Special cases are made for Plane and Scheme card "Crops" by checking the picture size in pixels. They are cropped to remove all borders, even though this loses bits of art that are mixed into the borders or partially transparent in the borders.
No image files will be replaced if they exist, which means you can stop and later restart any of the image file downloads, and they will quickly resume where they left off, assuming you select the same folder to put them in.
[Image Subfolders]
Downloaded images will already be in a folder based on the image format or formats selected. This folder is required unless you select only one of the three format types, and then unselect the option to use a per-format subfolder. From there, another folder for each set can be created, which can be named after the full set name or the short set code, depending on which option you select. You can also select to use no per-set folder, which will place all images in the same folder for each format selected.
An additional subfolder can be created for the language of the card. This subfolder will be created inside of the set subfolder if both subfolder options are used. You can select from the Full language name, or short 2 or 3 letter codes for the language.
Set folders can also be placed into category subfolders, if that option is selected. The categories can be set to any text you want in the Edit Set List option.
[Image File Names]
Inside of each of the folder created based on the image formats and image subfolder settings, the image files will be saved and named after the card ID, name, or printed name. The printed name can only be used, if you downloaded the Additional Card Information. Otherwise, the regular (English) name will be used for the file.
If you select the option to use the ID for the filename instead, then all card images will be named with nothing but the multiverse ID number and file extension.
The
option "Use .full.jpg instead of just .jpg" is only enabled if the image format "Fulls" is selected. This causes the files to get an extra ".full" extension inserted into the filename, which is helpful to some programs to tell it apart from crops.
If you are using the card name or printed name, then you can also select how to handle the numbered suffix for cards with multiple art versions, as well as how to handle split cards. The number can be placed after the card name with a space or no space, and then by itself, in parentheses, or in square brackets. Split cards can either just squish the two card names together, or can have an underscore between the two names.
[Gatherer Bugs]
A few types of formatting errors in the rules text are corrected automatically, such as i} for snow mana is corrected to just , ooX is corrected to :, and ocT is corrected to . Split and flip cards are reassembled into one card listing automatically. Also, the em dash in the type lines are changed to hyphens, which are easier for most people to deal with.
On the plus side, Gatherer is entirely presented in utf-8 xhtml, making it very easy to parse as xml data. A big thumbs up to Wizards for this. Unfortunately, there are slight parse errors here and there, which the downloader will try to work around.