Page 1 of 2

Ability Text Grammar Induction

PostPosted: 07 Jun 2015, 13:25
by YeGoblynQueenne
Hi guys.

A few years ago, 3 to be exact, I posted here and on my blog about Gleemin, a piece of M:tG software I created for my degree dissertation. At the time some folks were interested in the ability text interpreter I had created.

Well, I'm now doing a Masters in AI and, well, let's say I had an itch left to scratch and I got the opportunity I was hoping for. So I'm doing another dissertation on M:tG, this time focused on Ability Text.

My original interpreter was incomplete: it handled only a few AT expressions (deal damage, return to hand, destroy, stuff like that). It was really just a proof of concept.

The concept was that AT is not natural language, but rather more like a computer programming language, so it's quite possible to write a parser for it and then use Oracle text to drive a rules engine directly. Unfortunately the rules engine of my degree dissertation was also not too complete (no Planeswalkers, no static abilities, other bits missing). Like I say: proof of concept.

Another thing it proved was that although it is possible to make a parser and use it as above, it's also bloody hard and a bit of a game of whack-ye-mole (thou art too slow). AT keeps changing: every new set brings new keywords and Wizards keep making changes to the wording of things. My interpreter was hand-crafted, meaning I just went through the Comp Rules and a bunch of cards on the Gatherer and tried to figure out the structure behind them, then represented this in code. That sort of process has no chance of keeping up with the changes in AT over time. It's also bloody hard to get some decent coverage of the language unless you have a lot of time and resources.

So this time I'm going at it from a different angle: I'm looking for a way to derive the AT grammar from the text of the cards itself (the AT "corpus"). That process is known as grammar induction (GI).

Now, you might have heard of GI before. In short, it's another thing that's bloody hard to do (I keep putting my foot in it, eh?). Or at least it is when it comes to natural language or any language that's too complex to describe using a regular expression (i.e. a regular language).

The thing is, AT is not natural language. On the one hand it's what's known as a Controlled Natural Language, a subset of a natural language specifically selected to reduce ambiguity. CNLs are usually written by companies for their technical manuals and so on, but AT has all the hallmarks of a CNL.

On the other hand, it looks to me (and also to my dissertation supervisor who had a look at the tokenised AT corpus) that AT or at least parts of it may well be possible to describe using a regular language. If so, it should then be possible to derive its grammar automatically: to learn it. There are some algorithms for this sort of thing. They're not very nice (as in, they're horrid) but they sort of work and what can't be done automagickally, you can supplement with some good old hand-crafting.

Now, a lot of the work I need to do comes down to going through a corpus (in this case, all the AT on any card ever, excluding Un-sets) and annotating sentences with Part of Speech (POS) tags, things like "verb, noun, adjective" and so on. You can then build up from that to other components for a parser. So normally I'd be furiously at it right now but it seems that I might be able to use the scripted cards from Forge to avoid having to do that by hand. So I'm using some of the work done on Forge already- thank you guys :)

Also, back in 2011 I discussed using my parser to generate something that other projects might be able to use, like XML or BNF. I think this would be particularly useful to new projects that don't yet have a big community to script cards for them and so need a quick way to test lots of cards in their engine. This is actually a use case for my dissertation so it's definitely still in the cards. As it were.

There's a couple of other things I have in mind, like allowing a semantic search through a card database ("get all direct damage spells" instead of searching for spells with ".* deal.* damage to .* player.*" etc). That might be useful to card database projects.

In terms of creating a rule engine I also think it should be possible to actually learn a rule engine from the Comprehensive Rules document- it's almost like a program spec the way it's written and there's been some work on doing that sort of thing, learning things from legal texts or technical manuals or even game manuals (for Civilization) and so on. In any case I'm building an M:tG rule engine anyway for grammar validation purposes (and also just for fun) so that might be of interest to some folks also.

In any case, nice to see the forum is still going strong and people are still interested in writing M:tG software :)


(in case anyone is searching for my old posts, they are under "Ye Goblyn Queenne", with spaces. Long story, lost password & email account)

Re: Ability Text Grammar Induction

PostPosted: 07 Jun 2015, 13:45
by melvin
Great to hear that you are going to work on ability text parsing. Magarena currently does some limited ability text parsing with regular expression templates.

I've looked at generating a proper parser with Parsing Expression Grammar, but didn't manage to integrate it with the game engine. Putting it here in case it might be relevant to your work. It is available at https://github.com/magarena/magarena/bl ... ar/mtg.peg and when I last checked it is able to parse about 50% of rules text. The grammar was written for the Mouse PEG generator, http://mousepeg.sourceforge.net/

Re: Ability Text Grammar Induction

PostPosted: 08 Jun 2015, 22:18
by YeGoblynQueenne
Oh hey, hi. I remember Magarena and your parser.

Yay! I see you guys are using MCTS for the AI. AI tech ftw! :D

Just to clarify (my OP above was a bit too long): I'm planning to learn a grammar and generate a parser from it. As in machine-learn it. I've had it with hand-crafting anything. If the computer can't do it for me, I'm done with doing it by hand. And I don't mean just grammars. I mean anything!! :D

Re: Ability Text Grammar Induction

PostPosted: 15 Jun 2015, 21:10
by Huggybaby
Welcome back YGQ! I'm glad you're keeping us updated. 8)

Re: Ability Text Grammar Induction

PostPosted: 16 Jun 2015, 05:17
by proud
I am absolutely interested in seeing what you are able to do. Update us as you have stuff to share!

Re: Ability Text Grammar Induction

PostPosted: 20 Jun 2015, 20:56
by Arcanist
This is the first time I hear about "Grammar Induction". I did similar work 2 years ago using NLP techniques. Based on my experience I think your expectations are a bit unrealistic though.

Grammar induction is no silver bullet. You do not have to specify the grammar, which is nice. On the other hand, you need to identify a set of representative sentences on which to perform the inference. This would be somewhat easy if you had the parse trees at your disposal, but that is precisely what you are trying to produce in the first place!

Even if you solve that problem, there is no guarantee that the parser will produce the correct parse trees. for instance, multiple parses could be plausible for a given sentence based on your training data . How will the parser choose between them?

Finally there is a big (I mean BIG) difference between parsing some text and understanding its meaning. Grammar induction will help you learn the structure of sentences in a given language, but you will have to find a way to extract semantic information from the syntax trees it produces. Do you already know what you intend to use for that?

Do not think that I am trying to discourage you. On the contrary I think it is a very interesting project. However you should be prepared to spend a LOT of time on this. Do you already know which approach/algorithms you are going to use?

Re: Ability Text Grammar Induction

PostPosted: 01 Jul 2015, 12:09
by silly freak
I think melvin's point was, instead of adding POS annotations yourself, you can probably use an existing parser to assing those labels for a subset of abilities. That could speed up the work you need to do for learning quite a bit, if the parser recognizes 50% of rules text. It could also help a little with Arcanist's concerns, especially for choosing diverse training samples.

Re: Ability Text Grammar Induction

PostPosted: 07 Jul 2015, 23:15
by YeGoblynQueenne
Hi guys, sorry for going silent all this time. I didn't realise there were replies in the thread.

Arcanist- I think you mean this is _not_ the first time you hear about GI? :)

I don't think GI is a silver bullet. I spent a lot of the time since my post going over the bibliography. There are some hard results that basically say it's impossible to learn a language unless you pretty much already know it, or have access to someone who can provide an infinite amount of both positive and negative examples (an "oracle").

I am aware of these results and my view is that, faced with a problem that is impossible to solve the thing to do is to avoid having to solve it at all costs. In short I think I can work around the hardness of the problem.

The main reason why I think this is possible is that AT has been engineered to remove ambiguity, in fact ruthlessly so. So basically what you point out, that there may be many parse trees for a given sentence- that's the definition of ambiguity, but it's also much more an issue when learning a natural langauge. AT is not a natural language and its syntax trees are most of the time unambiguous.

To give you a very simple example: consider abilities that have the word "you" in them. In natural English, the word "you" may refer to the second person singular or plural and it may be even used in the abstract to signify "everyone" (as in the phrase "you never know"). In AT on the other hand, "you" always refers to the same entity, the controller of an ability. That is to say: there is no ambiguity in the use of "you" in AT.

There's also another thing to consider: AT is limited in its expressive power by the physical size of M:tG cards. The theory goes anyway that the defining characteristic of natural language is that it can loop infinitely upon itself ("Jon, who is a friend of Joan, who is married to Jim who's married to Jill whom we met when we went to see Jerrald, the cousin of Jack who lives with Jane next to Jonas who..."). AT can't nest in itself like this because there is no space for it to do that on a card. Which means it's a finite language, unlike natural language which is really infinite.

If you put these two together: a finite language with at worse very limited ambiguity, that makes for a problem that's probably, hopefully, with lots of fancy footwork, possible to solve.

As to the syntax-is-not-semantics issue- I'm aware of that too. I have some ideas about how to tackle this but they go through parsing AT, or rather, the superset of AT that includes the Comprehensive Rules. But it's a bit early for that :)

Anyway like I say it's my MSc thesis, so I got time to solve it. I'm doing it part-time so I have another year or so to work on it. I dare say I'll come up with something interesting in that time.

Also- GI on AT is possible; it's been done before, more or less, in this MTGSalvation thread (sorry about the weird url syntax- I'm not allowed to post actual urls, they look "too spammy" :)):

http (colon) (slashslash) (3ws) (dot) mtgsalvation (dot) com/forums/creativity/custom-card-creation/612057-generating-magic-cards-using-deep-recurrent-neural

That basically goes to show that the problem is tractable. In fact the guys in that thread are learning cards wholesale- not just the AT. By comparison, AT is easy to do.

And don't worry- I'm not discouraged if you try to poke holes in my reasoning. Actually, please do :)


silly freak- you're right I guess I misread melvin's post. I might make use of Magarena as of Forge scripted cards. Like I say I'm going to use a partial grammar to begin with so yes, if there's already a grammar I can use then that's great. I'm kind of veering away from that line of thinking at the moment (I'm working on a more or less unsupervised algorithm) but I'll have to come back to it eventually.

Re: Ability Text Grammar Induction

PostPosted: 08 Jul 2015, 03:03
by friarsol
YeGoblynQueenne wrote:So normally I'd be furiously at it right now but it seems that I might be able to use the scripted cards from Forge to avoid having to do that by hand. So I'm using some of the work done on Forge already- thank you guys :)
This all sounds pretty cool. As one of the main proponents of the card scripts/engine on Forge, let me know if you have any specific questions about things, and how it all works.

Re: Ability Text Grammar Induction

PostPosted: 09 Jul 2015, 13:40
by YeGoblynQueenne
friarsol wrote:This all sounds pretty cool. As one of the main proponents of the card scripts/engine on Forge, let me know if you have any specific questions about things, and how it all works.
Thanks, that's good to know.

Actually I do have a question for you.

I did experiment a bit with Forge script. What I would like to do is to find a 1-to-1 mapping between the parts of the text box in a script and the ability text on a card. For instance, where you have the Forge script for Lightning Bolt:

Code: Select all
A:SP$ DealDamage | Cost$ R | ValidTgts$ Creature,Player | TgtPrompt$ Select target creature or player | NumDmg$ 3 | SpellDescription$ CARDNAME deals 3 damage to target creature or player.
I'd like to be able to map it directly to the ability text on the card, similar to this:

Code: Select all
    Spell Ability_______________________A:SP$
    Lightning Bolt______________________CARDNAME
    deals_______________________________DealDamage
    3 damage____________________________NumDmg$ 3
    to target creature or player._______ValidTgts$ Creature, Player
If I had a mapping like that then it would be very simple to treat Forge scripts as annotated AT trees and I could train a POS tagger with minimal effort.

The problem is that Forge script is not arranged exactly like the AT on a card. See the Lightning Bolt ability above. Now, it's not a problem to rearrange the AT automatically - provided the order of the parts on a Forge script stays constant between cards and abilities.

So my question is this: is this the case? If I were to take, say, all cards that have a burn ability, like Lightning Bolt, would they all have more or less the same key-value pairs in the same order, or is the order allowed to change (in which case it's really up to the person scripting it to keep the same order)?

YeGoblynQueenne wrote:There's also another thing to consider: AT is limited in its expressive power by the physical size of M:tG cards. The theory goes anyway that the defining characteristic of natural language is that it can loop infinitely upon itself ("Jon, who is a friend of Joan, who is married to Jim who's married to Jill whom we met when we went to see Jerrald, the cousin of Jack who lives with Jane next to Jonas who..."). AT can't nest in itself like this because there is no space for it to do that on a card. Which means it's a finite language, unlike natural language which is really infinite.
Right, that came out all wrong. Should teach me to post half-asleep.

The point I was trying to make is that AT ends up having very flat parse trees, with no parenthetical phrases etc. Those are easier to learn than natural language parse trees.

It's probably possible to create infinitely many AT sentences, but their size will still be restricted, or in any case the examples of AT that we have are restricted in size. In terms of learning from examples this is very convenient because it means "set phrases" in the language are easier to identify and learn.

That's more what I meant. I think :0

Re: Ability Text Grammar Induction

PostPosted: 09 Jul 2015, 19:29
by friarsol
So my question is this: is this the case? If I were to take, say, all cards that have a burn ability, like Lightning Bolt, would they all have more or less the same key-value pairs in the same order, or is the order allowed to change (in which case it's really up to the person scripting it to keep the same order)?
Yes they more or less have the same key-value pairs. Certain ones are required, and others are optional depending on the AbilityEffect. They aren't necessarily in the same order. I typically order mine the same way, but we definitely don't have a required standard. However, I don't believe the order is actually important here, since you can split and custom sort by keys before you start parsing.

Here's how we parse:
Split on the : to find out if this is an Ability, Trigger, Static, or Keyword.

Take the remaining script and split on the | to get each keyvalue and split those on the $ for Key and Value. Trim all access whitespace.

Here are some keys that basically any SA can have:
- Cost: The cost of the spell, ability, or trigger. This is required for Spells and Abilities, not required for SubAbilities.
- ValidTgts: Needed to setup a Target object with restrictions for GameObjects, there's a handful of related keys that can make the restrictions more precise.
- Description: This comes in a few different forms, but it's the same idea each time. How the SA is going to appear on the GameObject, or the Stack. Any time "CARDNAME" shows up it will be replaced by the cards actual name, as set earlier in the script. (For cards that clone GameObjects it will use it's own name instead of the copied name)
- Prompt: Improve an inputPrompt so we don't have to rely on the generated prompt to convert ValidTgts$ Creature.Red+YouDontCtrl into "Select target red creature you don't control"
- SubAbility: SAs that do more than one thing. Lunge "Deal2 Damage to target creature. Deal 2 Damage to target player." is represented as an ability with a SubAbility.
- References: Telling the main ability which variables used later in the script should be associated with this SA.
- Defined: Used in cases where Targets aren't used. A gain life spell Healing Salve, or a pump ability Rootwalla would both used Defined.

From there, pretty much everything else is Effect specific. In general we try to use the same keys that mean similar things across effects, but it's certainly not 100% synchronized (I've been meaning to synchronize scripts to use "Amount" instead of "NumDmg" and "NumCards", but haven't quite gotten the time to do so)

We do have a Wiki here:

http://www.slightlymagic.net/wiki/Forge ... Structures
http://www.slightlymagic.net/wiki/Forge_AbilityFactory

which hopefully will give you some ideas how about how it's all setup if you need more information on that. As far as optional Parameters I'm sure it's not 100% up to date. As typically those will get added as new sets come out, and Wizards comes up with strange ways of wording things.

Re: Ability Text Grammar Induction

PostPosted: 10 Jul 2015, 21:17
by Arcanist
YeGoblynQueenne wrote:Arcanist- I think you mean this is _not_ the first time you hear about GI?
No, I mean the opposite. I understand what GI is, and that this is what I have been doing in practice. I have just never come across the term before. I suppose it is one of those expressions that only shows up when you specifically look for it.

To add to my previous post, I am not really concerned about the feasbility of implementing a parser. I did it 2 years ago, so I know it is doable. However I only focused on producing syntax trees. I did not try to extract semantic information from them because I did not know how I wanted to represent that information.

It seems to me that you are trying to do both at the same time. If you already have an output format in mind, I suppose that is fine but you will probably end up being limited by that choice. For instance, I don't think that Forge contains all the cards in the game, or that it has a representation for rules.

Regarding your other points:

YeGoblynQueenne wrote:The point I was trying to make is that AT ends up having very flat parse trees, with no parenthetical phrases etc. Those are easier to learn than natural language parse trees.
I am not sure I agree with that. The depth of a parse tree depends both on the complexity of the sentence, and the representation you choose for the AST. In the example you gave, the parse tree is flat because you decided to represent a big chunk of the Lightning Bolt ability text ("to target creature or player") as a single entity (ValidTgt$). A representation with more granularity will result in a deeper (and more complex) parse tree.

I had a quick look at my training set. The deepest tree I could find has a depth of 20, and it corresponds to the following sentence:

"Domain- Creatures can't attack you unless their controller pays {X} for each creature he or she controls that's attacking you, where X is the number of basic land types among lands you control."

The longest sentence "only" produces a tree of depth 18.

"If at least two creatures named Wolves of the Hunt you control, one of which has "bands with other creatures named Wolves of the Hunt", are blocking or being blocked by the same creature, you divide that creature's combat damage, not its controller, among any of the creatures it's being blocked by or is blocking."

This shows that MtG cards in spite of their size can contain text that is fairly complex.


YeGoblynQueenne wrote:Also- GI on AT is possible; it's been done before, more or less, in this MTGSalvation thread (sorry about the weird url syntax- I'm not allowed to post actual urls, they look "too spammy" :)):
I have not read that thread, but I am aware of it. It has little to do with what you are trying to accomplish. If I am not mistaken, what this neural network does is work at character level to generate text that looks like a valid MtG card. The neural network has no idea what the card does. it is neat, but there are other tools floating around that do the same using alternative approaches. I have seen one that is grammar-based, and one based on a simple Markov model (called "Markov the Gathering").

Re: Ability Text Grammar Induction

PostPosted: 25 Jul 2015, 14:10
by YeGoblynQueenne
Hi guys, sorry for being so late with my replies. Been busy hacking at my project. It's going well :) I guess I'll post some results at some point, but first some answers!

friarsol wrote:Yes they more or less have the same key-value pairs. Certain ones are required, and others are optional depending on the AbilityEffect. They aren't necessarily in the same order. I typically order mine the same way, but we definitely don't have a required standard. However, I don't believe the order is actually important here, since you can split and custom sort by keys before you start parsing.
Ah. Hm. No, look, it's not that simple. Here's the problem, in a semi-formal way:

We are given a sentence in AT and an unordered set of key-value pairs in Forge script, corresponding to the AT sentence:

Code: Select all
Lightning Bolt deals 3 damage to target creature or player.
Code: Select all
A:SP$ DealDamage | Cost$ R | ValidTgts$ Creature,Player | TgtPrompt$ Select target creature or player | NumDmg$ 3 | SpellDescription$ CARDNAME deals 3 damage to target creature or player.
We would like to match each key-value pair to the corresponding set of tokens in the AT sentence, so that for example we can find the following correspondence:

corresponds_to([ValidTgts$ Creature,Player], [target,creature,or,player]).
Now, that is not possible to do unless the order of keys in a Forge script kvp set _relative_ to the tokens in an AT sentence is constant. Which, from what I understand from your comment, it is not.

Actually it _is_ possible to do the above, provided we have a complete grammar for AT, a complete grammar for Forge script and a complete mapping between the two- so for example both you and I can do the mapping because somehow, in some way (let's not even go there) we can understand the structure of the two languages. Only in that case, there's no point in the exercise, since the ultimate goal is to learn
a grammar for the AT language.

But of course the real problem is to write a program that can do the same mapping between AT and Forge script as I and you can do.

The good news is that this is possible to do - but it's not a simple matter of iterating through keys and AT tokens. It's actually a machine learning problem, where you have some examples of AT and the corresponding Forge script kvp sets in the input and you get the desired mapping between AT tokens and specific kvps in the output.

And that's an interesting poblem, but I seem to be doing OK so far with my algorithm without having to solve it.

The Forge team on the other hand- and you particularly, Friarsol, could benefit from solving that problem, I think. It might help you simplify or debug or even completely obviate the need for the Forge API. Although I don't forget that part of the fun to be had with Forge is in scripting new cards and that would go out the window if the whole thing was done automagickally.

I did read (some) of the documentation of the Forge API in the wiki and also had a look at the source etc. btw, I think I understand the basics. But thanks for clarifying the bit about the ordering, I really wasn't sure about that.

Arcanist wrote:No, I mean the opposite. I understand what GI is, and that this is what I have been doing in practice.
Aaah, OK, gotcha now :)

I'm interested to hear more about what you're doing. You said something about training data and learning abstract syntax trees? That sounds pretty much like GI, yes. What is your project called? I'd like to have a look, it could be very helpful to see how others have gone about the same thing I'm trying to do.

Arcanist wrote:It seems to me that you are trying to do both at the same time. If you already have an output format in mind, I suppose that is fine but you will probably end up being limited by that choice. For instance, I don't think that Forge contains all the cards in the game, or that it has a representation for rules.
At this point I'm only trying to learn rewrite rules for a grammar, so just syntax without any interest in semantics whatsoever. I'll think about semantics later.

As to the Forge script, I'm considering it as a treebank; wikipedia has a good explanation of that: https://en.wikipedia.org/wiki/Treebank

But in short, I'd like to learn parse trees for AT sentences from the Forge script. An example of such a parse tree would be something like this:

Code: Select all
(A:SP$
   (CARDNAME Lightning Bolt)
   (DealDamage deals
      (NumDmg$  3) damage)
   (ValidTgts$  to target creature or player.)
This is basically the AT-to-Forge script correspondence I mean in my reply to friarsol above.

Learning the rules for the game (as opposed to the AT grammar) is a different issue; I consider the Comprehensive Rules to be a superset of AT, beyond what is found on cards. The language used in the Comp Rules is also a Controlled Natural Language and it contains a description of the rules of AT, so you can see it as a superset of AT thanks to that relation. The Comp Rules document also contains a description of the rules of the game, in the same super-AT language.

Now, I don't care so much about learning the grammar of that super-AT. What's more interesting is extracting knowledge about the rules of the game from the Comprehensive Rules document, knowledge enough to allow a learner to learn how to play the game by reading the rules.

This is not as hard as it sounds; for one thing, we (human players) have done it, so it's possible to do. It's also possible to program a computer to do it and there's a team in MIT that's already done it, for Civilisation (the computer game) among other things (Windows help files). See here for one article in the press:

http://www.extremetech.com/extreme/9004 ... ion-manual

Now, that does go over to semantics but like I say, it's a very next step of my project. First, an AT parser, which is actually the subject of my dissertation. Then we see about the rest.


Arcanist wrote:I have not read that thread, but I am aware of it. It has little to do with what
you are trying to accomplish. If I am not mistaken, what this neural network does is work at character level to generate text that looks like a valid MtG card. The neural network has no idea what the card does. it is neat, but there are other tools floating around that do the same using alternative approaches. I
have seen one that is grammar-based, and one based on a simple Markov model (called "Markov the Gathering").
Again it depends, on your point of view. That neural network is learning a model of the AT language (as part of a more general model of an M:tG card). It's prooobably not learning a traditional grammar, but a traditional grammar is just one type of language model. If you take "Grammar Induction" to mean "learning a language model" then what that network does is GI. It's certainly learning some sort of structure- just don't ask me what that structure is. It's a neural network. With lots of layers. Who knows _what_ it's learning.

Arcanist wrote:This shows that MtG cards in spite of their size can contain text that is fairly
complex.
They can, yes, but they tend not to and there's a limit to this complexity bounded by the physical size of the card. There's also a limit that's much harder to pin down: how much complexity is too much complexity for a game like M:tG. But it's not that important to have a good definition of that complexity- the good news is that there _is_ a limit to the complexity, which makes it possible to learn the language.

Re: Ability Text Grammar Induction

PostPosted: 26 Jul 2015, 11:50
by Arcanist
YeGoblynQueenne wrote:I'm interested to hear more about what you're doing. You said something about training data and learning abstract syntax trees? That sounds pretty much like GI, yes. What is your project called? I'd like to have a look, it could be very helpful to see how others have gone about the same thing I'm trying to do.
I would not call that a project, more like a hobby. As a result it does not have any name. I worked on it about 2 years ago: I was interested in learning NLP techniques and MtG seemed like a good data set to use due to its rather strict templating.

At some point I ended up with a training set of about 2,500 sentences in a purely syntactic Treebank format. I did not want to reinvent the wheel, so I decided to use the same format as the one described in your Wikipedia link. I have to warn you though: Creating a Treebank training set this size is not just a difficult task, it is plain insane.

Anyway, applying the corresponding grammar to approximately 15,000 sentences (everything up to Gatecrash) seemed to indicate that the approach could work. I did not have the time to take a very close look at the results though, as I had to stop playing with this due to lack of time.

Following your messages I decided to pick this up again. I tweaked the format I use, and I applied the resulting grammar to my data set again, but with mixed results. Some of the changes I made are apparently hindering parsing, and I will probably have to revert them. The main problem of inference-based approaches lies in identifying bad parse trees. Obviously bad ones are easy to find, but there are often subtle mistakes that are hard to spot. Until I double-check the entire output, there is no real way to know whether they are indeed 100% correct. As you can see, this is still a work in progress, and a lot more work is required.

I don't really want to waste too much time on that though. Right now, I am more interested in extracting semantic information from syntax trees. I am currently trying to decide which format I should use to store that information.

YeGoblynQueenne wrote:Again it depends, on your point of view. That neural network is learning a model of the AT language (as part of a more general model of an M:tG card). It's prooobably not learning a traditional grammar, but a traditional grammar is just one type of language model. If you take "Grammar Induction" to mean "learning a language model" then what that network does is GI. It's certainly learning some sort of structure- just don't ask me what that structure is. It's a neural network. With lots of layers. Who knows _what_ it's learning.
Neural networks are just a supervised learning approach one can use to approximate an arbitrary function. In the present case, the neural network seems to work at character level since it is generating non-words like "Mointainspalk". So what the neural network does is probably take a bunch of characters and arrange them to "look like" a valid MtG card. No grammar or semantics are involved.

It is a powerful approach because it can be applied as is to completely different tasks (the original goal of the program was to emulate the work of Shakespeare, not MtG cards), but it is also useless for your purpose.

YeGoblynQueenne wrote:They can, yes, but they tend not to and there's a limit to this complexity bounded by the physical size of the card.
I agree that there is an upper bound on the complexity if you limit yourself to MtG cards. There is no doubt about it. All I am saying is that a short sentence can still be very complex. "Ambiguity" from Unhinged is an example, but there are others in normal sets.

Re: Ability Text Grammar Induction

PostPosted: 29 Jul 2015, 05:28
by MageKing17
When I was working on Incantus, I wound up spending some time making the oracle parser create ability code automatically, by parsing the text.

This code, as far as I can tell, never made it into any of the source archives or repositories, because it was part of the web editor, the source for which was not released (and currently doesn't seem to be working on my site; probably due to Django updates).

So I thought I'd put it on pastebin here: http://pastebin.com/FMB7Xaqy

While this is rather out of date (last updated at the end of August in 2012, and Magic templating has changed quite a bit since then), it did manage to parse a significant number of cards without need of human input to correct it. It's possible somebody might find it useful for something, so I don't care what you do with it as long as you credit Incantus and myself if you do anything based on it.