Log in

How-To sitescriptTemplate

Contents

Introduction

This wiki page gives information about properties and functions you can find in LHpi sitescriptTemplate file. It helps you to understand what you must/can define and how-to define it.

Any remarks or feedbacks are welcome.


Properties

The library will exit with an error if any non-optional properties or functins are undefined to inform you what absolutely needs to be defined.

LHpi library version

Make sure that your script defines the version of LHpi library it uses.

The property to adapt in sitescript file is: libver

Example:

libver = "2.7"

LHpi library version

Make sure that your script defines the version of LHpi data file it uses.

The property to adapt in sitescript file is: dataver

Example:dataver = "1"

(semi-optional)Name of sitescript

The first thing to do is to define the sitescript filename in order for LHpi library to know which one is executed.
It is used to construct the log file name (if SAVELOG is true) and the savepath default (needed for example if you want to save your HTML page locally for later offline process).

The property to adapt in sitescript file is: scriptname

Example:

scriptname = "LHpi.TheWebsiteName-v".. libver .. ".1.lua"

It is a good choice to leave the library and data versions into the filename in order to know, without editing sitescript file, which version of LHpi library is used. If unset, defaults to fallback value "LHpi.SITESCRIPT_NAME_NOT_SET-v" .. LHpi.version .. ".lua", which probably is not what you want.

Regular expression to retrieve card info

Next thing to configure is the regular expression used by LHpi to retrieve all information of cards from HTML page.
You can find useful information about LUA regular expression in the LUA reference page: [1]

The regular expression shall match all of one card's price info from the HTML page, so that each result contains (at least) the card name (or names, if multiple localized names are given in one entry) and price, but including also other information that is given for the card, such as foil status and language (if this information is not constant through the whole source file).
Seperating all this information into its parts is done later via site.ParseHtmlData. Having all localized names is useful to set the card langage semi-automatically. If only one common entry is given for both foil and nonfoil price, you will need to apply some additional tricks later.

To test your regular expression, you can download locally (with your browser) from your website the HTML page of a set.
Edit that HTML page with Notepad++ and use the Find function (CTRL + F). In the Find windows, on the bottom left in Find Mode, you have the choice "Regular Expression".
Select that option and try your regular expression. Each time you click on Next button, the selection done by Notepad++ should highlight all information of ONE card, with a different card highlighted at each Next click. Alternatively, continue writing the absolute minimum required for a runable sitescript, set DEBUG to true, DEBUGSKIPFOUND to false, run the sitescript and compare the log with the source data.

The property to adapt in sitescript file is: site.regex

Example: site.regex = "(<div.-name=\".-\".-foil=\"%d\".-class=\"price\">.-</span>)"

(Optional) Currency

You can also define currency used by your website when different from the dollar '$'. Currently, this informationis not used, but it might be useful later. Once it is, it will default to "$" if unset.

The property to adapt in sitescript file is: site.currency

Example: site.currency="€"

(Optional) Regular expression about expected results

If the website displayed information about the set, for example the number of card, you can define the regular expression to retrieve these informations.
LHpi will log this together with the number of cards it is about to set prices for. This could be useful to manually check that the number of card data matches equals the number of cards claimed to be in the html source file, to make sure your regex finds all entries.

The property to adapt in sitescript file is: site.resultregex

By default, property is not set.
When property is not defined, the corresponding log line will be skipped.

(Optional) Site Encoding

When the website uses a specific encoding (e.g. UTF-8), you can define it in this property.
A correct encoding definition is needed in order for LHpi to process card data correctly and make sure that the price can be imported by MA. Both LHpi and MA expect strings to be utf-8 encoded.
The names returned from site.ParseHtmlData are converted by LHpi.LHpi.Toutf8. Curently, only "cp1252" actually does any converting, while "utf8" or "utf-8" intentionally returns the string as it was.
If you need it, you can call LHpi.LHpi.Toutf8 with other strings from your sitescript.

The property to adapt in sitescript file is: site.encoding

Example: site.encoding="UTF8"
By default, the property is not set and default encoding used is "cp1252"

Site languages

The language of cards supported by the website should be defined here.

By default, English language is defined but you can add more languages when supported. (do we need the example? The template contains all languages that are supported by MA, commented out. Only the url infix is left blank and needs to be filled in.)

The property to adapt in sitescript file is: site.langs

Example:

 site.langs = {
      [1] = {id=1, full = "English", abbr="ENG",  url="eng" },
      [4] = {id=4, full = "French",  abbr="FRA",  url="fra" }
 }

The format of each entry is:

[<langid>] = {id=<langid>, full = "<Fullname>", abbr="<AbbrName", url="<ParameterValue>"]

where

  • <langid> is the ID of language as defined by Magic Album in file \Database\Languages.txt
  • <Fullname> is the full name of the language
  • <AbbrName> if the abbreviated name of the language
  • <ParameterValue> is the value that can be appended to the website url when retrieving cards in that language

site.frucs

FRUC - Foil Rare Uncommon Common table
The property defines all types of cards available on the website. This table is useful when card information are available at different urls depending on their rarity/foilage. You can use the fruc's strings as url infixes in site.BuildUrl.
important note: The library assumes fruc[1] to be foil and all other frucs to be nonfoil. Thus, when you set to import only one type of foilage to be imported in MA's price manager dialog, all non-applicable frucs will not be imported. If the site gives foil and nonfoil prices on the same page, you still need to define two frucs. Have site.BuildUrl return the same url for both frucs. Duplicate urls for the same set will not be downloaded nor processed twice.

The property to adapt in sitescript file is: site.frucs

Example: site.frucs = { "foil", "regular" }

Site sets

The most important property of the sitescript.

The property to adapt in sitescript file is: site.sets

It defines the mapping between the Magic Album (MA) and the website supported sets.
It tells LHpi library that the MA set with id xxx is available on the website with foil/regular prices, in which language(s), at specific url suffix.

Here is the format of one set mapping:

[<setid>]={id = <setid>, lang = { <eng avail>, [<langid>]=<lang avail>}, fruc = { <foil avail>, <regular avail> }, url = "<url suffix>"},

where

  • <setid> is the id of the set as defined by MA. See file \Database\Sets.txt
  • <eng avail> is a boolean indicating whether the price information is available for cards in ENGLISH language or not
  • <langid> is the id of a language as defined by MA. See file \Database\Languages.txt
  • <lang avail> is a boolean indicating if the price information is available for cards in language defined by <langid> (previous marker)

NOTE: <langid> and <lang avail> markers are used together when website supports card information in language different than English. If no other language is available, only <eng avail> marker needs to be set.
TODO: add more info how the card language is actually set, either here or below in ParseHtmlData

  • <foil avail> is a boolean indicating whether price of foil cards is available or not
  • <regular avail> is a boolean indicating whether price of regular cards is available or not.

NOTE: You can have more boolean in the fruc array. The number of boolean depends on the FRUC property you defined previously. If the (previous) property is an array of 4 frucs (e.g. {"foil", "rare", "uncommon", "common"}) then the fruc property in the set mapping would have 4 booleans: {true, true, true, true} each one defining whether the price of corresponding fruc is available or not.
TODO: add more info how the card foilage is actually set, either here or below in ParseHtmlData

  • <url suffix> is the suffix added to the website url in order to retrieve price of cards for the current set.

Example:

[800]={id = 800, lang = { true, [4]=true}, fruc = { true, true }, url = "1966"}, -- Theros

  • 800 is the id of Theros set defined by MA
  • Both English and French languages are available on the website for card prices
  • Both foil and regular prices are available on the website. Assume than site.fruc is defined like {"foil", "regular"}
  • The id of the Theros set on website is 1966. That id (=suffix) will be appended to the url when retrieving card prices of Theros.

(Optional)Card name replacement

Sometimes, the card name given by the website does not match the card name defined in MA.
You need to match both in order to set the price. This is the objective of the property.

The property to adapt in sitescript file is: site.namereplace

The format is:

 {
  [<setid>] = { 
     ["<website cardname 1>"]	= "<MA cardname 1>",
     ["<website cardname 2>"]	= "<MA cardname 2>"
  }
 }

where

  • <setid> is the id of the set defined by MA. See file \Database\Sets.txt.
  • <website cardname> is the name of the card as defined by the website
  • <MA cardname> if the name of the corresponding card defined by MA

Note:

  • A set can have more than one namereplacement mapping (cardname 1, cardname 2, ...)
  • site.namereplacement can have more then one set replacement definition

(Optional)Variants

Some sets have variants of the same card, for example basic lands.
The variants property defines all of this and does mapping between the name of card given by the website and its variant in MA.

The property to adapt in sitescript file is: site.variants

The format is: { [<setid>] = { ["<website cardname>"] = { "<MA cardname>", { <variants name> } }, ... } ... }

where

  • <setid> is the id of the set defined by MA. See file \Database\Sets.txt.
  • <website cardname> is the variant name of the card as defined by the website
  • <MA cardname> if the name of the corresponding card defined by MA
  • <variants name> if the array of variants name as defined by MA. Usually, it is { 1, 2, 3, 4 }; see the default variant tables in LHpi.Data for other examples.

Example:

[800] = { -- Theros
  ["Plains"] 						= { "Plains"	, { 1    , 2    , 3    , 4     } },
  ["Island"] 						= { "Island" 	, { 1    , 2    , 3    , 4     } },
  ["Swamp 1"] 						= { "Swamp"		, { 1    , false, false, false } },
  ["Swamp 2"] 						= { "Swamp"		, { false, 2    , false, false } },
  ["Swamp 3"] 						= { "Swamp"		, { false, false, 3    , false } },
  ["Swamp 4"] 						= { "Swamp"		, { false, false, false, 4     } },
  ["Mountain"] 					= { "Mountain"	, { 1    , 2    , 3    , 4     } },
  ["Forest"] 						= { "Forest" 	, { 1    , 2    , 3    , 4     } }
}

Website uses the same name for Plains, Island, Mountain and Forest variants. So the price for these card will be identical for all variants.
However, website does distinction between variant of Swamp. So "Swamp 1" is mapped to the variant 1 of Swamp in MA, "Swamp 2 is mapped to variant 2, ...

Please note that defining a set variant in sitescript will override all default variants defined by LHpi library for that set. So make sure to define all variants in your sitescript to avoid losing variants previously defined by LHpi. Alternatively, use a namereplacement table to set the card names to the ones expected by the default variant table. In sets that have a collector number, the default variant table defined by LHpi.Data uses "name (collector number)" to denote variants, and "name" for "all variants". Sets without collector numbers use the variant name instead of the collector number.

(Optional)Foil tweak

This is similar to the namereplace table and can be used to set specific cards to foil or nonfoil explicitely.

The property to adapt in sitescript file is: site.foiltweak

The format is:

{ [<setid>] = { ["<website cardname 1>"] = { foil = <foilstatus> }, ["<website cardname 2>"] = { foil = <foilstatus> } ... } ... } where

- <setid> is the id of the set defined by MA. See file \Database\Sets.txt. - <website cardname> is the name of the card as defined by the website - <foilstatus> is a boolean, true for foil.

Please note that defining a set foiltweak in the sitescript will override all default foiltweaks defined by LHpi library for that set. So make sure to define all foiltweaks in your sitescript to avoid losing filtweaks previously defined by LHpi. Alternatively, use a namereplacement table to set the card names to the ones expected by the default foiltweak table.

(Optional)Expected results

When you know exactly the information that LHpi should retrieve from website, you can define these expectation in this property.

The property to adapt in sitescript file is: site.expected

The format is:

{
  [<setid>] = {  pset={ <nb of card in ENG>, [<langid>]=<nb of card in lang> }, failed={ <nb of failed in ENG>, [<langid>]=<nb of failed in lang> }, 
                 dropped=<nb of dropped card>, namereplaced=<nb of card with name replacement>, foiltweaked=<nb of card foil tweaked> },
}

where

  • <setid> is the id of the set defined by MA. See file \Database\Sets.txt.
  • <nb of card in ENG> is the total number of card expected for the set, in English language.
  • <langid> is the id of a language as defined by MA. See file \Database\Languages.txt.
  • <nb of card in lang> is the total umber of card expected for the set, in the corresponding language.
  • <nb of failed in ENG> is the total number of card that LHpi failed to import for the set, in English language.
  • <nb of failed in lang> is the total number of card that LHpi failed to import for the set, in the corresponding language.
  • <nb of dropped card> is the total number of card that LHpi dropped for the set, all languages combined.
  • <nb of card with name replacement> is the total number of card that LHpi did name replacement, all languages combined.
  • <nb of card foil tweaked> is the total number of card that LHpi did a foiltweak replacement, all languages combined.

Example:

{
  [788] = { pset={ 249+11, nil, 249 }, failed={ 0, nil, 11 }, dropped=0, namereplaced=1, foiltweaked=0 }, -- M2013
}

Please not that expectation occurs only if LHpi property CHECKEXPECTED is set to true.

LHpi properties

You can customize how LHpi works or reports events during price imports with the help of Global properties.

  • VERBOSE

Controls the amount of feedback/logging done by LHpi.
If unset, defaults to true.

  • LOGDROPS

Controls whether dropped cards are logged or not by LHpi.
If unset, defaults to false.

  • LOGNAMEREPLACE

Controls whether name replacement of cards arelogged or not by LHpi.
If unset, defaults to false.

  • CHECKEXPECTED

Controls whether to check the counter of import done by LHpi agains the expected counter values defined in site.expected. See Site Expected results property.
If unset, defaults to true.

  • DEBUG

When true, LHpi logs everything (which is more than VERBOSE, but still honouring LOGDROPS and LOGNAMEREPLACE) and exits in case of errors (instead of continuing as best as it can). If unset, defaults to false.

  • DEBUGSKIPFOUND

While DEBUG, do not log raw html data found by site.regex. Set this to true and check the log to debug your site.regex. If unset, defaults to true.

  • DEBUGVARIANTS

Enable DEBUG when card variants are encountered by LHpi. This is proably only needed true if you change the library code that deals with variants.
If unset, defaults to false.

  • OFFLINE

Control whether LHpi reads source data from local directory ('savepath' property) only or from site url. Use this to import old data or to save yourself and the website some bandwith while debugging your sitescript.
If unset, defaults to false.

  • SAVEHTML

Controls whether LHpi saves a local copy of each source html to 'savepath' when not in OFFLINE mode.
When property is activated, the local directory where LHpi saves data must be writable, otherwise LHpi will just disable SAVEHTML and note it in the log.
If unset, defaults to false.

  • SAVELOG

Control whether LHpi logs to separate logfile (true) or in Magic Album.log (false).
If unset, defaults to true.

  • SAVETABLE

Control whether LHpi saves prices into a file (in 'savepath') before importing to Magic Album. As with SAVEHTML, this will be set to false if savepath is not writeable.
If unset, defaults to false.

  • savepath

Name of existing directory, in \Prices folder, where LHpi read (OFFLINE) or write (SAVEHTML) source html data when corresponding properties are activated.
If this property is not set and LHpi is required to use it (OFFLINE or SAVEHTML activated), then the savepath defined by LHpi corresponds to the scriptname without versioning information.

Functions

ImportPrice ( importfoil, importlangs, importsets )

This is the main function, called by Magic Album, to import prices of selected sets for selected languages, in regular or foil quality.
It is the entry point where LHpi load it's library LUA file and starts the import magic.

Do not change anything in this function and keep it like this.

site.BuildUrl ( setid, langid, frucid, offline )

The purpose of this function is to construct the url where LHpi will retrieve HTML data containing card information about a specific set, language and FRUC.

Parameters of the function are:

  • setid: ID of the set for which url is constructed
  • langid: ID of the language to import
  • frucid: ID of the FRUC to import
  • offline: Boolean flag indicating whether the source are retrieved from local file or from internet

The content of the function already presents in sitescript.lua file can be used as an example.
What you should adapt for your website is following properties in the function:

site.domain = 'www.example.com/prices/'
The domain of the website where you find the prices. Without 'http' prefix.


site.file = 'set.php?context=magic'
The name of the file appended to the domain, and containing HTML parameters which do not concern the setid, langid nor fruc.


site.setprefix = "&set="
The HTML parameter defining the ID of the set on the website.
The corresponding value of the parameter is retrieved from "site.sets[<setid>].url" property.


site.langprefix = "&lang="
The HTML parameter defining the ID of the language on the website.
The corresponding value of the parameter is retrieved from "site.langs[<langid>].url" property.


site.frucprefix = "&fruc="
The HTML parameter defining the ID of the FRUC on the website.
The corresponding value of the parameter is retrieved from "site.frucs[<frucid>]" property.


site.suffix = ""
Any url suffix that can be appended to the url.


The next property in the function you can adapt, if for example some previous parameters are not required, is the url.
In the function, you will find a line like:

local url = site.domain .. site.file .. site.setprefix .. site.sets[setid].url .. site.langprefix .. site.langs[langid].url .. site.frucprefix .. site.frucs[frucid]
which use all properties previously defined in the function.
If some of them are not needed by the website, just remove it from the url construction.

The rest of the function should not be adapted, except if you know what you're doing.


site.ParseHtmlData ( foundstring, urldetails )

The purpose of the function is to extract card information (name, price, ...) from each entry found by the site.regex property in the HTML page.

Parameters of the function are:

foundstring
The card entry found by site.regex in the source HTML page. This entry should match information of ONE card.
If you know that the HTML page has more than one entry for the same card (e.g. one entry for regular price, one entry for foil price),
process only ONE entry at the time and set the correct properties on the card returned by the function.
urldetails
Table containing some information about the currently processed url.
The information is the following: { foilonly = #boolean , isfile = #boolean , setid = #number, langid = #number, frucid = #number }
Properties foilonly and isfile are the ones coming from the site.BuildUrl function.

You must first retrieve card information from foundstring with the help of regular expressions or other mechanisms that you know.
For example:

 local _start,_end,name = string.find(foundstring, 'cardname=\"(.-)\"' )
 local _start,_end,price = string.find( foundstring , 'class="price">([%d.,]+) .-</span>' )

The price retrieved from html source can actually contain decimal separators. In order for LHpi to support different kinds of separators ( 1.000,00 vs 1,000.00 ), it is important that the price you return does not contain these.
For example, if the price on the page is '12.50' or '12,50' (12 euros/dollars/<anything else> and 50 cents), the price you should return is '1250'.
With that price, LHpi will divide it by 100 in order to retrieve the cent amount. To make sure LHpi does not attempt to divide a string, please do explicit tonumber conversion.
A good way to do that is:

  price = string.gsub( price , "[,.]" , "" ) -- Remove all ',' or '.' characters from the price
  price = tonumber( price ) -- Convert price String to Number
 

Once you're done, you can construct the actual newCard returned by the function with following code:

  local newCard = { names = { [urldetails.langid] = name }, price = { [urldetails.langid] = price } [, <any other properties you can found>] }
 

names and price can contain entries for multiple languages, but the langids that do contain information should match. Note the marker <any other properties you can found>. In the newCard table, you can define some properties in addition to required names and price.
These informations can afterward be used in next function site.BCDpluginPre and/or site.BCDpluginPost (explained later).

These extra properties must be defined in a table and added to newCard under pluginData property.
Example:

  local newCard = { names = { ... }, price = { ... } , pluginData={ propA=ValueA, propB=ValueB ... }] }
 

In addition to required names and price properties, you can also set specific properties on the card that will be handled by LHpi. For each of these properties, if they are already defined by site.ParseHtmlData, LHpi.BuildCardData will keep the predefined values and skip the automatic detection and processing.:

name
this, together with the setid, is the primary key (unique identifier) of the card, as far as LHpi is concerned. Multiple entries of the same card will be collected, for example to associate different foilage, language and/or variant prices. This name must be identical to the oracle name in MA (alternatively, the localized name if the LHpi card dataset is for only one language), or through namereplacement and other processing by BuildCardData, become such (this means that for example "Forest Nr. 247 (fOiL)" would still be valid at this point). See LHpi.BuildCardData source for details or experiment. You can also set this to something unique to make sure this card triggers your namereplace,variant or foiltweak tables.
foil
a boolean that if true marks the card, and its price, as being foiled. It's probably a good idea to set this explicitely if you don't have distinct foil and nonfoil frucs. Still subject to change by foiltweak table entries.
drop
a boolean value that if true results in LHpi skipping further processing of this card entry and not adding a price to be imported. See #different methods of dropping cards
lang
a table of the form { <langid>=<langabbr>, ... }, where langid is a number corresponding to site.langs, and langabbr the matching abbreviation. It should only be set if the card contains a price for this language (in the price fieldd, which will then be applied to all languages set here, or predefined by regprice/foilprice below).
variant
to override the variants defined in 'site.variants' for the setid and cardname.
regprice
a table of the form { <langid>=<price>, ... }, where langid again corresponds to site.langs, and price is the price of the nonfoil card in this language. If variants is defined, <price> instead is a subtable of the form { <varname>=<price>, ... }, where varname is a string that must correspond to a variant name that exists for this card.
foilprice
a table of the form { <langid>=<price>, ... }, where langid again corresponds to site.langs, and price is the price of the foil card in this language. If variants is defined, <price> instead is a subtable of the form { <varname>=<price>, ... }, where varname is a string that must correspond to a variant name that exists for this card.

(Optional)site.BCDpluginPre ( card , setid )

Function for special cases card data manipulation.
Ties into LHpi.buildCardData to make changes that are specific to one site and thus don't belong into the library.
This Plugin function is called before most of LHpi's BuildCardData processing. Already processed and set are card.name, card.lang, and cards that were already set as drop==true have been dropped before this point.

Parameters:

card
The card, resulting from parseHtmlData and initialized by LHpi after the function call, currently processed.
a table { name=<name>, pluginData=<customdata>, names={ [<langid>]=<localname>, ...}, lang={ [<langid>]=<langabbr>, ...}, <ParseHtmlData-presets> }
setid
ID of the set currently processed.
Return card
modified card for futher processing.

An example of such function is to replace characters in all card name by an alternative.
For example:

  card.name = string.gsub( card.name , "AE" , "Æ")
  card.name = string.gsub( card.name , "Ae" , "Æ") 
 

(Optional)site.BCDpluginPost( card , setid )

Function for special cases card data manipulation.
Ties into LHpi.buildCardData to make changes that are specific to one site and thus don't belong into the library.
This Plugin function is called after LHpi's BuildCardData processing (and probably not needed). At this point, LHpi.BuildCardData has finished its processing; card.foil and card.pluginData are still present for use, but won't be needed by LHpi anymore.

Parameters:

card
The card currently processed by LHpi.
TODO: complete card able anatomy
setid
ID of the set currently processed.
Return card
modified card for further processing.