The Dumbest Downloader That Could Possibly Work (and did)
General Discussion of the Intricacies
Moderator: CCGHQ Admins
The Dumbest Downloader That Could Possibly Work (and did)
by eternaleye » 03 Nov 2012, 01:31
I learned a long time ago that, when coding, it is always a good idea to at least try The Dumbest Thing That Could Possibly Work, because if it does you will save yourself a lot of effort.
It seems to have worked.
This Perl program, when run, downloads the card data sets used by MTGO from the same sources as MTGO itself uses when installing and updating.
It basically mimics the updater's logic.
Put simply, the installer uses a Microsoft split installer assembly, and downloads its components piecemeal. This kind of installer keeps its data as nice, convenient XML, including easy-to-use references to where the pieces are.
Half an hour later, I have this:
The files it produces are XML that use IDs to cross-reference each other, it has all sets as far as I can tell, it includes Vanguard (including full oracle text!), flavor text, everything I have thought to look for. Enjoy!
EDIT: Oh, and it's also quite fast. Sub-15min runtime.
It seems to have worked.
This Perl program, when run, downloads the card data sets used by MTGO from the same sources as MTGO itself uses when installing and updating.
It basically mimics the updater's logic.
Put simply, the installer uses a Microsoft split installer assembly, and downloads its components piecemeal. This kind of installer keeps its data as nice, convenient XML, including easy-to-use references to where the pieces are.
Half an hour later, I have this:
- Code: Select all
#!/usr/bin/env perl
use strict;
use warnings;
use v5.14;
use LWP::Simple;
use XML::XQL;
use XML::XQL::DOM;
my $parser = new XML::DOM::Parser;
my %config;
sub doget {
my $file = shift;
my $out = $file;
$out =~ s,^.*/,,;
mirror( $config{server} . $file, $out );
return( $out );
}
sub locateManifest {
my $rootdoc = $parser->parsefile( doget( shift ) );
my ( $result ) = $rootdoc->xql( "/asmv1:assembly/dependency/dependentAssembly" );
my $location = $result->getAttribute( 'codebase' );
$location =~ tr,\\,/,;
$location =~ m,(.*/)(.*),;
my ( $verdir, $fname ) = ( $1, $2 );
return( $verdir, $fname );
}
sub locateData {
my $dir = shift;
my $manifestdoc = $parser->parsefile( doget( $dir . shift ) );
my @results = $manifestdoc->xql( "/asmv1:assembly/file" );
return( map { $dir . $_ } grep { $_ =~ m,^CardDataSource/, } map { $_ =~ tr,\\,/,; $_ } map { $_->getAttribute( 'name' ) } @results );
}
$config{server} = shift(@ARGV);
my $rootfile = "MTGO.application";
map { doget( $_ ) } locateData( locateManifest( $rootfile ) )
The files it produces are XML that use IDs to cross-reference each other, it has all sets as far as I can tell, it includes Vanguard (including full oracle text!), flavor text, everything I have thought to look for. Enjoy!
EDIT: Oh, and it's also quite fast. Sub-15min runtime.
- eternaleye
- Posts: 8
- Joined: 07 Nov 2010, 05:49
- Has thanked: 0 time
- Been thanked: 4 times
by eternaleye » 03 Nov 2012, 05:00
New version!
It now checks the hashes of the downloaded files, so that files don't get redownloaded if unchanged.
Also, it now drops the mainfest and application file in internal/ and the actual data in source/ rather than splattering them all over $PWD.
Currently working on letting it parallelize if you have Coro installed, and operating normally if not.
I've examined the data some more. It looks like sets that aren't playable in MTGO don't have oracle text, among other things. On the other hand, there are some things that are rather nice for machine parsing:
[*]Reminder text is enclosed in @i(Reminder Goes Here)i@
[*]Newlines are included in the oracle text as a "\n"
[*]Various bits of rules text, such as the text for leveler levels, flipped cards, level-up costs, types and subtypes, zones, colors, CMCs, and a few others are factored out into their own files, with IDs the cards crosslink to.
It now checks the hashes of the downloaded files, so that files don't get redownloaded if unchanged.
Also, it now drops the mainfest and application file in internal/ and the actual data in source/ rather than splattering them all over $PWD.
Currently working on letting it parallelize if you have Coro installed, and operating normally if not.
- Code: Select all
#!/usr/bin/env perl
use strict;
use warnings;
use v5.14;
# In core, guaranteed present
use Module::Load::Conditional;
use Digest::SHA qw/sha1_base64/;
use LWP::Simple;
use XML::XQL;
use XML::XQL::DOM;
my $parser = new XML::DOM::Parser;
my %config;
sub srcToFile {
my $type = shift;
my $src = shift;
my $out = $src;
$out =~ s,^.*/,,;
$out = $type . '/' . $out;
return( $out );
}
sub _doget {
mirror( $config{server} . shift, shift );
}
sub checkHash {
my $type = shift;
my $file = shift;
my $digest = shift;
my $verifier = Digest::SHA->new( $type )->addfile( $file, 'b' )->b64digest();
while ( ( length( $verifier ) % 4 ) != 0 ) {
$verifier .= '=';
}
if ( $verifier eq $digest ) {
return( 1 );
} else {
say "$file has hash $verifier, does not match $digest";
return( 0 );
}
}
sub checkSize {
my $file = shift;
my $size = shift;
return( ( -s $file ) == $size );
}
sub testAndMkdir {
my $dir = shift;
if ( ! -d $dir ) {
mkdir( $dir ) or die "Couldn't create $dir";
}
}
sub needsDownloaded {
my $type = shift;
my $file = shift;
my $dom = shift;
return( 1 ) unless -e $file;
unless ( checkSize( $file, $dom->getAttribute( 'size' ) ) ) {
return( 1 );
}
for ( $dom->xql('./hash') ) {
my ( $type ) = $_->xql( './dsig:DigestMethod' );
$type = $type->getAttribute( 'Algorithm' );
( undef, $type ) = split( /#/, $type );
my ( $hash ) = $_->xql( './dsig:DigestValue' );
$hash = $hash->getChildNodes()->item( 0 )->getNodeValue();
if ( checkHash( $type, $file, $hash ) ) {
say "$file has good hash of type $type";
} else {
return( 1 );
}
}
return( 0 );
}
sub doget {
my $type = shift;
my $from = shift;
my $to = srcToFile( $type, $from );
my $dom = shift;
if ( needsDownloaded( $type, $to, $dom ) ) {
say "$from --> $to";
_doget( $from, $to );
}
return( $to );
}
sub locateManifest {
my $src = shift;
my $file = srcToFile( 'internal', $src );
_doget( $src, $file );
my $rootdoc = $parser->parsefile( $file );
my ( $result ) = $rootdoc->xql( "/asmv1:assembly/dependency/dependentAssembly" );
my $location = $result->getAttribute( 'codebase' );
$location =~ tr,\\,/,;
$location =~ m,(.*/)(.*),;
my ( $verdir, $fname ) = ( $1, $2 );
return( $verdir, $fname, $result );
}
sub locateData {
my $dir = shift;
my $manifest = shift;
my $rootdom = shift;
my $manifestdoc = $parser->parsefile(
doget( 'internal', $dir . $manifest, $rootdom )
);
my @results = $manifestdoc->xql( "/asmv1:assembly/file" );
my @data = map {
$_->[0] = $dir . $_->[0];
$_
} grep {
$_->[0] =~ m,^CardDataSource/,
} map {
$_->[0] =~ tr,\\,/,;
$_
} map {
[ $_->getAttribute( 'name' ), $_ ]
} @results;
return( @data );
}
$config{server} = shift( @ARGV );
my $rootfile = "MTGO.application";
map { doget( 'source', $_->[0], $_->[1] ) } locateData( locateManifest( $rootfile ) )
I've examined the data some more. It looks like sets that aren't playable in MTGO don't have oracle text, among other things. On the other hand, there are some things that are rather nice for machine parsing:
[*]Reminder text is enclosed in @i(Reminder Goes Here)i@
[*]Newlines are included in the oracle text as a "\n"
[*]Various bits of rules text, such as the text for leveler levels, flipped cards, level-up costs, types and subtypes, zones, colors, CMCs, and a few others are factored out into their own files, with IDs the cards crosslink to.
- eternaleye
- Posts: 8
- Joined: 07 Nov 2010, 05:49
- Has thanked: 0 time
- Been thanked: 4 times
2 posts
• Page 1 of 1
Return to Magic Rules Engine Programming
Who is online
Users browsing this forum: No registered users and 18 guests