Should semantic markup encode predictable orthographic transformations?

Thoughts and discussion regarding the philosophy of the Web, practical implementations there·of, and where we can go from here. Publicly‐accessible.
Post Reply
User avatar
Lady
Posts: 36
Joined: Saturday, 11 January 2025 @ 13:03 EST
Location: PHL
Contact:

Should semantic markup encode predictable orthographic transformations?

Post by Lady »

The subject line is generic but this is a question about capitalization in English. Suppose you have the following T·E·I :⁠—

Code: Select all

<p>
  This is a sample sentence that <addName>Lady</addName> is using to demonstrate
    <name>T·E·I</name> functionality for her blog.<milestone unit="sentence"/>
  This is a second sentence, mentioning <forename>æsc</forename>.<milestone
    unit="sentence"/>
</p>
The capitalization isn’t really necessary here, since the <⸺name> and <milestone unit="sentence"/> provide enough information to derive the capitalization programmatically. The special case of æsc’s name could be represented with rend="lowercase". So this could just as easily be encoded as :⁠—

Code: Select all

<p>
  this is a sample sentence that <addName>lady</addName> is using to demonstrate
    <name>t·e·i</name> functionality for her blog.<milestone unit="sentence"/>
  this is a second sentence, mentioning <forename rend="lowercase">æsc</forename
    >.<milestone unit="sentence"/>
</p>
Does anyone have thoughts as to which of these is preferable? I’m feeling like the latter is probably better if strict adherence to the original source isn’t a requirement, but I’m open to competing opinions.
Post Reply