It is currently 09 Dec 2018, 23:32
   
Text Size

The Dumbest Downloader That Could Possibly Work (and did)

General Discussion of the Intricacies

Moderator: CCGHQ Admins

The Dumbest Downloader That Could Possibly Work (and did)

Postby eternaleye » 03 Nov 2012, 01:31

I learned a long time ago that, when coding, it is always a good idea to at least try The Dumbest Thing That Could Possibly Work, because if it does you will save yourself a lot of effort.

It seems to have worked.

This Perl program, when run, downloads the card data sets used by MTGO from the same sources as MTGO itself uses when installing and updating.

It basically mimics the updater's logic.

Put simply, the installer uses a Microsoft split installer assembly, and downloads its components piecemeal. This kind of installer keeps its data as nice, convenient XML, including easy-to-use references to where the pieces are.

Half an hour later, I have this:

Code: Select all
#!/usr/bin/env perl

use strict;
use warnings;
use v5.14;

use LWP::Simple;
use XML::XQL;
use XML::XQL::DOM;

my $parser = new XML::DOM::Parser;

my %config;

sub doget {
    my $file = shift;
    my $out = $file;
    $out =~ s,^.*/,,;
    mirror( $config{server} . $file, $out );
    return( $out );
}

sub locateManifest {
    my $rootdoc = $parser->parsefile( doget( shift ) );
    my ( $result ) = $rootdoc->xql( "/asmv1:assembly/dependency/dependentAssembly" );
    my $location = $result->getAttribute( 'codebase' );
    $location =~ tr,\\,/,;
    $location =~ m,(.*/)(.*),;
    my ( $verdir, $fname ) = ( $1, $2 );
    return( $verdir, $fname );
}

sub locateData {
    my $dir = shift;
    my $manifestdoc = $parser->parsefile( doget( $dir . shift ) );
    my @results = $manifestdoc->xql( "/asmv1:assembly/file" );
    return( map { $dir . $_ } grep { $_ =~ m,^CardDataSource/, } map { $_ =~ tr,\\,/,; $_ } map { $_->getAttribute( 'name' ) } @results );
}

$config{server} = shift(@ARGV);

my $rootfile = "MTGO.application";
map { doget( $_ ) } locateData( locateManifest( $rootfile ) )

The forum made me remove the URL for the server ("spamminess"), but it's an HTTP url for mtgoclientdepot, subdomain of onlinegaming, subdomain of the WOTC site.

The files it produces are XML that use IDs to cross-reference each other, it has all sets as far as I can tell, it includes Vanguard (including full oracle text!), flavor text, everything I have thought to look for. Enjoy!

EDIT: Oh, and it's also quite fast. Sub-15min runtime.
eternaleye
 
Posts: 8
Joined: 07 Nov 2010, 05:49
Has thanked: 0 time
Been thanked: 4 times

Postby eternaleye » 03 Nov 2012, 05:00

New version!

It now checks the hashes of the downloaded files, so that files don't get redownloaded if unchanged.

Also, it now drops the mainfest and application file in internal/ and the actual data in source/ rather than splattering them all over $PWD.

Currently working on letting it parallelize if you have Coro installed, and operating normally if not.

Code: Select all
#!/usr/bin/env perl

use strict;
use warnings;
use v5.14;

# In core, guaranteed present
use Module::Load::Conditional;
use Digest::SHA qw/sha1_base64/;

use LWP::Simple;
use XML::XQL;
use XML::XQL::DOM;

my $parser = new XML::DOM::Parser;

my %config;

sub srcToFile {
    my $type = shift;
    my $src = shift;
    my $out = $src;
    $out =~ s,^.*/,,;
    $out = $type . '/' . $out;
    return( $out );
}

sub _doget {
    mirror( $config{server} . shift, shift );
}

sub checkHash {
    my $type = shift;
    my $file = shift;
    my $digest = shift;
    my $verifier = Digest::SHA->new( $type )->addfile( $file, 'b' )->b64digest();
    while ( ( length( $verifier ) % 4 ) != 0 ) {
        $verifier .= '=';
    }

    if ( $verifier eq $digest ) {
        return( 1 );
    } else {
        say "$file has hash $verifier, does not match $digest";
        return( 0 );
    }
}

sub checkSize {
    my $file = shift;
    my $size = shift;
    return( ( -s $file ) == $size );
}

sub testAndMkdir {
    my $dir = shift;
    if ( ! -d $dir ) {
        mkdir( $dir ) or die "Couldn't create $dir";
    }
}

sub needsDownloaded {
    my $type = shift;
    my $file = shift;
    my $dom = shift;

    return( 1 ) unless -e $file;

    unless ( checkSize( $file, $dom->getAttribute( 'size' ) ) ) {
        return( 1 );
    }

    for ( $dom->xql('./hash') ) {
        my ( $type ) = $_->xql( './dsig:DigestMethod' );
        $type = $type->getAttribute( 'Algorithm' );
        ( undef, $type ) = split( /#/, $type );

        my ( $hash ) = $_->xql( './dsig:DigestValue' );
        $hash = $hash->getChildNodes()->item( 0 )->getNodeValue();

        if ( checkHash( $type, $file, $hash ) ) {
            say "$file has good hash of type $type";
        } else {
            return( 1 );
        }
    }
    return( 0 );
}

sub doget {
    my $type = shift;
    my $from = shift;
    my $to = srcToFile( $type, $from );
    my $dom = shift;
    if ( needsDownloaded( $type, $to, $dom ) ) {
        say "$from --> $to";
        _doget( $from, $to );
    }
    return( $to );
}

sub locateManifest {
    my $src = shift;
    my $file = srcToFile( 'internal', $src );
    _doget( $src, $file );

    my $rootdoc = $parser->parsefile( $file );
    my ( $result ) = $rootdoc->xql( "/asmv1:assembly/dependency/dependentAssembly" );
    my $location = $result->getAttribute( 'codebase' );
    $location =~ tr,\\,/,;
    $location =~ m,(.*/)(.*),;

    my ( $verdir, $fname ) = ( $1, $2 );
    return( $verdir, $fname, $result );
}

sub locateData {
    my $dir = shift;
    my $manifest = shift;
    my $rootdom = shift;

    my $manifestdoc = $parser->parsefile(
        doget( 'internal', $dir . $manifest, $rootdom )
    );
    my @results = $manifestdoc->xql( "/asmv1:assembly/file" );

    my @data = map {
        $_->[0] = $dir . $_->[0];
        $_
    } grep {
        $_->[0] =~ m,^CardDataSource/,
    } map {
        $_->[0] =~ tr,\\,/,;
        $_
    } map {
        [ $_->getAttribute( 'name' ), $_ ]
    } @results;

    return( @data );
}

$config{server} = shift( @ARGV );

my $rootfile = "MTGO.application";

map { doget( 'source', $_->[0], $_->[1] ) } locateData( locateManifest( $rootfile ) )

EDIT:

I've examined the data some more. It looks like sets that aren't playable in MTGO don't have oracle text, among other things. On the other hand, there are some things that are rather nice for machine parsing:
[*]Reminder text is enclosed in @i(Reminder Goes Here)i@
[*]Newlines are included in the oracle text as a "\n"
[*]Various bits of rules text, such as the text for leveler levels, flipped cards, level-up costs, types and subtypes, zones, colors, CMCs, and a few others are factored out into their own files, with IDs the cards crosslink to.
eternaleye
 
Posts: 8
Joined: 07 Nov 2010, 05:49
Has thanked: 0 time
Been thanked: 4 times


Return to Magic Rules Engine Programming

Who is online

Users browsing this forum: No registered users and 1 guest


Who is online

In total there is 1 user online :: 0 registered, 0 hidden and 1 guest (based on users active over the past 10 minutes)
Most users ever online was 279 on 11 Jul 2013, 22:03

Users browsing this forum: No registered users and 1 guest

Login Form