Should semantic markup encode predictable orthographic transformations?

Post by **Lady** » Wednesday, 29 January 2025 @ 13:22 EST

The subject line is generic but this is a question about capitalization in English. Suppose you have the following T·E·I :⁠—

Code: Select all

<p>
  This is a sample sentence that <addName>Lady</addName> is using to demonstrate
    <name>T·E·I</name> functionality for her blog.<milestone unit="sentence"/>
  This is a second sentence, mentioning <forename>æsc</forename>.<milestone
    unit="sentence"/>
</p>

The capitalization isn’t really necessary here, since the <⸺name> and <milestone unit="sentence"/> provide enough information to derive the capitalization programmatically. The special case of æsc’s name could be represented with rend="lowercase". So this could just as easily be encoded as :⁠—

Code: Select all

<p>
  this is a sample sentence that <addName>lady</addName> is using to demonstrate
    <name>t·e·i</name> functionality for her blog.<milestone unit="sentence"/>
  this is a second sentence, mentioning <forename rend="lowercase">æsc</forename
    >.<milestone unit="sentence"/>
</p>

Does anyone have thoughts as to which of these is preferable? I’m feeling like the latter is probably better if strict adherence to the original source isn’t a requirement, but I’m open to competing opinions.