Ability Text Grammar Induction
Posted: 07 Jun 2015, 13:25
Hi guys.
A few years ago, 3 to be exact, I posted here and on my blog about Gleemin, a piece of M:tG software I created for my degree dissertation. At the time some folks were interested in the ability text interpreter I had created.
Well, I'm now doing a Masters in AI and, well, let's say I had an itch left to scratch and I got the opportunity I was hoping for. So I'm doing another dissertation on M:tG, this time focused on Ability Text.
My original interpreter was incomplete: it handled only a few AT expressions (deal damage, return to hand, destroy, stuff like that). It was really just a proof of concept.
The concept was that AT is not natural language, but rather more like a computer programming language, so it's quite possible to write a parser for it and then use Oracle text to drive a rules engine directly. Unfortunately the rules engine of my degree dissertation was also not too complete (no Planeswalkers, no static abilities, other bits missing). Like I say: proof of concept.
Another thing it proved was that although it is possible to make a parser and use it as above, it's also bloody hard and a bit of a game of whack-ye-mole (thou art too slow). AT keeps changing: every new set brings new keywords and Wizards keep making changes to the wording of things. My interpreter was hand-crafted, meaning I just went through the Comp Rules and a bunch of cards on the Gatherer and tried to figure out the structure behind them, then represented this in code. That sort of process has no chance of keeping up with the changes in AT over time. It's also bloody hard to get some decent coverage of the language unless you have a lot of time and resources.
So this time I'm going at it from a different angle: I'm looking for a way to derive the AT grammar from the text of the cards itself (the AT "corpus"). That process is known as grammar induction (GI).
Now, you might have heard of GI before. In short, it's another thing that's bloody hard to do (I keep putting my foot in it, eh?). Or at least it is when it comes to natural language or any language that's too complex to describe using a regular expression (i.e. a regular language).
The thing is, AT is not natural language. On the one hand it's what's known as a Controlled Natural Language, a subset of a natural language specifically selected to reduce ambiguity. CNLs are usually written by companies for their technical manuals and so on, but AT has all the hallmarks of a CNL.
On the other hand, it looks to me (and also to my dissertation supervisor who had a look at the tokenised AT corpus) that AT or at least parts of it may well be possible to describe using a regular language. If so, it should then be possible to derive its grammar automatically: to learn it. There are some algorithms for this sort of thing. They're not very nice (as in, they're horrid) but they sort of work and what can't be done automagickally, you can supplement with some good old hand-crafting.
Now, a lot of the work I need to do comes down to going through a corpus (in this case, all the AT on any card ever, excluding Un-sets) and annotating sentences with Part of Speech (POS) tags, things like "verb, noun, adjective" and so on. You can then build up from that to other components for a parser. So normally I'd be furiously at it right now but it seems that I might be able to use the scripted cards from Forge to avoid having to do that by hand. So I'm using some of the work done on Forge already- thank you guys
Also, back in 2011 I discussed using my parser to generate something that other projects might be able to use, like XML or BNF. I think this would be particularly useful to new projects that don't yet have a big community to script cards for them and so need a quick way to test lots of cards in their engine. This is actually a use case for my dissertation so it's definitely still in the cards. As it were.
There's a couple of other things I have in mind, like allowing a semantic search through a card database ("get all direct damage spells" instead of searching for spells with ".* deal.* damage to .* player.*" etc). That might be useful to card database projects.
In terms of creating a rule engine I also think it should be possible to actually learn a rule engine from the Comprehensive Rules document- it's almost like a program spec the way it's written and there's been some work on doing that sort of thing, learning things from legal texts or technical manuals or even game manuals (for Civilization) and so on. In any case I'm building an M:tG rule engine anyway for grammar validation purposes (and also just for fun) so that might be of interest to some folks also.
In any case, nice to see the forum is still going strong and people are still interested in writing M:tG software
(in case anyone is searching for my old posts, they are under "Ye Goblyn Queenne", with spaces. Long story, lost password & email account)
A few years ago, 3 to be exact, I posted here and on my blog about Gleemin, a piece of M:tG software I created for my degree dissertation. At the time some folks were interested in the ability text interpreter I had created.
Well, I'm now doing a Masters in AI and, well, let's say I had an itch left to scratch and I got the opportunity I was hoping for. So I'm doing another dissertation on M:tG, this time focused on Ability Text.
My original interpreter was incomplete: it handled only a few AT expressions (deal damage, return to hand, destroy, stuff like that). It was really just a proof of concept.
The concept was that AT is not natural language, but rather more like a computer programming language, so it's quite possible to write a parser for it and then use Oracle text to drive a rules engine directly. Unfortunately the rules engine of my degree dissertation was also not too complete (no Planeswalkers, no static abilities, other bits missing). Like I say: proof of concept.
Another thing it proved was that although it is possible to make a parser and use it as above, it's also bloody hard and a bit of a game of whack-ye-mole (thou art too slow). AT keeps changing: every new set brings new keywords and Wizards keep making changes to the wording of things. My interpreter was hand-crafted, meaning I just went through the Comp Rules and a bunch of cards on the Gatherer and tried to figure out the structure behind them, then represented this in code. That sort of process has no chance of keeping up with the changes in AT over time. It's also bloody hard to get some decent coverage of the language unless you have a lot of time and resources.
So this time I'm going at it from a different angle: I'm looking for a way to derive the AT grammar from the text of the cards itself (the AT "corpus"). That process is known as grammar induction (GI).
Now, you might have heard of GI before. In short, it's another thing that's bloody hard to do (I keep putting my foot in it, eh?). Or at least it is when it comes to natural language or any language that's too complex to describe using a regular expression (i.e. a regular language).
The thing is, AT is not natural language. On the one hand it's what's known as a Controlled Natural Language, a subset of a natural language specifically selected to reduce ambiguity. CNLs are usually written by companies for their technical manuals and so on, but AT has all the hallmarks of a CNL.
On the other hand, it looks to me (and also to my dissertation supervisor who had a look at the tokenised AT corpus) that AT or at least parts of it may well be possible to describe using a regular language. If so, it should then be possible to derive its grammar automatically: to learn it. There are some algorithms for this sort of thing. They're not very nice (as in, they're horrid) but they sort of work and what can't be done automagickally, you can supplement with some good old hand-crafting.
Now, a lot of the work I need to do comes down to going through a corpus (in this case, all the AT on any card ever, excluding Un-sets) and annotating sentences with Part of Speech (POS) tags, things like "verb, noun, adjective" and so on. You can then build up from that to other components for a parser. So normally I'd be furiously at it right now but it seems that I might be able to use the scripted cards from Forge to avoid having to do that by hand. So I'm using some of the work done on Forge already- thank you guys
Also, back in 2011 I discussed using my parser to generate something that other projects might be able to use, like XML or BNF. I think this would be particularly useful to new projects that don't yet have a big community to script cards for them and so need a quick way to test lots of cards in their engine. This is actually a use case for my dissertation so it's definitely still in the cards. As it were.
There's a couple of other things I have in mind, like allowing a semantic search through a card database ("get all direct damage spells" instead of searching for spells with ".* deal.* damage to .* player.*" etc). That might be useful to card database projects.
In terms of creating a rule engine I also think it should be possible to actually learn a rule engine from the Comprehensive Rules document- it's almost like a program spec the way it's written and there's been some work on doing that sort of thing, learning things from legal texts or technical manuals or even game manuals (for Civilization) and so on. In any case I'm building an M:tG rule engine anyway for grammar validation purposes (and also just for fun) so that might be of interest to some folks also.
In any case, nice to see the forum is still going strong and people are still interested in writing M:tG software
(in case anyone is searching for my old posts, they are under "Ye Goblyn Queenne", with spaces. Long story, lost password & email account)