On text and parsers

It’s been a long time since the last release, so I thought I’d write a little about what has been going on. The game (like lots of IF) spends a lot of time messing around parsing text and expressions. I’d got to the point where there were three separate expression parsers being used, all with slightly different “features” (and bugs) and it was getting, at best, confusing. After a lot of work there is just one, and it’s better than any of the old ones it replaced.

The final straw for doing this was wanting to write descriptions that were only applicable to certain sized characters and NPCs. It started with the breast fondling code, and I can finally write scripted tests like “$0.body.bust.mass > 4000” (here $0 refers to the character instigating the action which could be the player, or an NPC.

The other big parsing challenge is generating the textual output. As this is a weight gain game, it needs to generate descriptions that are specific to the character. As there are transformations you can’t assume the character has hands; they might be paws or something else. Then there’s the changes for whether it’s the player or an NPC doing the action. The input text ends up looking something like this:

  "{subj,0}gently{verb,cup}the pliant flesh of one "
+ "{part,0,bust,a1mkzn}"
+ "in{posadj}{part,0,hand,n},"
+ "{npc,0,seemingly}enjoying the sensitivity and heft"
+ "{pc,0,{comma}thinking of the way men check you out in the street}."

And the code parses and substitutes the bits in {}'s. Here {subj,0} sets the subject of the sentence to the first arg which is the person doing the action, it also outputs “you”, “the kobold”, or “Rochelle” etc depending on whether it’s the PC or an NPC and if the player knows their name. The {verb,cup} generates “cup” or “cups” depending on the subject. “{part,0,bust,a1mkzn}” describes a body part, in this case the bust of the character. The string of gobbledygook says that ‘a’ a size adjective should be used, ‘1’ one breast should be described, ‘m’ emotional state (if applicable), ‘k’ describe the skin type if it is different from the torso, ‘z’ extend the description, ‘n’ include the noun. The code also re-orders adjectives so they occur in the right order before the noun. The {posadj} generates the correct possessive adjective (your, her, his). The hand part will output hand, paw etc for the character. Finally the “pc” and “npc” ones apply if the character is the PC or an NPC respectively - the PC can describe what they are thinking, for an NPC the narrative can only describe what the player can observe (usually).

Getting this stuff to work right has been a major pain. It gets even more confusing during speech, where the first person becomes “I” whether they are the PC or not.

It should pay off by avoiding generating awkward text I hope!

[quote=“dingotush, post:1, topic:1174”]It’s been a long time since the last release, so I thought I’d write a little about what has been going on. The game (like lots of IF) spends a lot of time messing around parsing text and expressions. I’d got to the point where there were three separate expression parsers being used, all with slightly different “features” (and bugs) and it was getting, at best, confusing. After a lot of work there is just one, and it’s better than any of the old ones it replaced.

The final straw for doing this was wanting to write descriptions that were only applicable to certain sized characters and NPCs. It started with the breast fondling code, and I can finally write scripted tests like “$0.body.bust.mass > 4000” (here $0 refers to the character instigating the action which could be the player, or an NPC.

The other big parsing challenge is generating the textual output. As this is a weight gain game, it needs to generate descriptions that are specific to the character. As there are transformations you can’t assume the character has hands; they might be paws or something else. Then there’s the changes for whether it’s the player or an NPC doing the action. The input text ends up looking something like this:
[tt]
"{subj,0}gently{verb,cup}the pliant flesh of one "

  • “{part,0,bust,a1mkzn}”
  • “in{posadj}{part,0,hand,n},”
  • “{npc,0,seemingly}enjoying the sensitivity and heft”
  • “{pc,0,{comma}thinking of the way men check you out in the street}.”
    [/tt]
    And the code parses and substitutes the bits in {}'s. Here {subj,0} sets the subject of the sentence to the first arg which is the person doing the action, it also outputs “you”, “the kobold”, or “Rochelle” etc depending on whether it’s the PC or an NPC and if the player knows their name. The {verb,cup} generates “cup” or “cups” depending on the subject. “{part,0,bust,a1mkzn}” describes a body part, in this case the bust of the character. The string of gobbledygook says that ‘a’ a size adjective should be used, ‘1’ one breast should be described, ‘m’ emotional state (if applicable), ‘k’ describe the skin type if it is different from the torso, ‘z’ extend the description, ‘n’ include the noun. The code also re-orders adjectives so they occur in the right order before the noun. The {posadj} generates the correct possessive adjective (your, her, his). The hand part will output hand, paw etc for the character. Finally the “pc” and “npc” ones apply if the character is the PC or an NPC respectively - the PC can describe what they are thinking, for an NPC the narrative can only describe what the player can observe (usually).

Getting this stuff to work right has been a major pain. It gets even more confusing during speech, where the first person becomes “I” whether they are the PC or not.

It should pay off by avoiding generating awkward text I hope![/quote]

Fun stuff. While it is complicated and tedious, I actually somewhat enjoy working on text parsers. Did you put in an escape sequence for the curly brace? Its a minor detail, but one I see a few people forget when working on stuff like this.

Oh yes, I use ‘’ as an escape character, to force the next character to be taken literally, whatever it is. Java (and a lot of Unix things) uses it too, so if I did want a ‘{’ in the output I have to write “\{”.

There’s a lot of other stuff buried in the parser too. It also deals with converting numbers representing volumes, lengths and weights from the internal units (metric) to whatever unit system the user selects in the UI. It does conditionals, random text, fixes capitalisation and whitespace, and handles generating the correct pronouns and articles. The bit for cup sizes was especially awkward as they a. don’t make a lot of sense, and b. there’s no international standard (with differences inside the EU countries too).