Buddhism, HTML and diacritics

Reposting this reference post since the demise of my previous blog.  I also have an updated page here too with a focus on Sanskrit.

If you want to impress your friends (or your blog readers…*ahem*) when you talk about Buddhism, why not use some HTML diacritics?

You see, most of the Buddhist terms you read about derive from one or more non-European langauges:

  • Sanskrit: the holy language used in Hinduism, religious literature. Now a dead language.
  • Pali: an ancient language in India, mostly used for trade. It was popular as a lingua franca. Also a dead language.
  • Classical Chinese: this is how Chinese was in the olden days. There are more Buddhist texts preserved in Classical Chinese than any other language.
  • Japanese: actually, most Japanese Buddhist terms are really just Classical Chinese with Japanese pronunciations, as was the style back then.

None of these languages natively use a Romanized script like Western European languages do, so it’s up to translators to figure out how to Romanize things. So, to capture all the sounds that don’t exist in English, linguistics experts recycle Roman letters, but add extra characters: diacritics.

Until real recently, it was pretty difficult to print non-standard Roman characters on a webpage. Back then, users had to download special fonts, and your browser had to be able to read them.

Now though, as the Internet becomes more international, you can pretty much print any Romanized character you want using special “extended-ASCII” codes in HTML.

For example, let’s say I want to print an ā character. In the old days, I could use a Character Palette program on Windows or Mac to copy/paste it (if I could find it), but now I can just use the HTML extended-ASCII code & # 257 ;. This is, all one word, an ampersand, a pound sign, the HTML code number and a semi-colon. If you put these together the web browser will automatically translate it into the right letter you want.

All extended-ASCII letters in HTML have the format of

&#(number);

So, the trick is just remembering what number you want, and fill in the blanks. Remember that you have to do this for each special letter you want to print. Here’s a helpful chart for some commonly used diacritics and letters for Buddhist terms. Most are for Pali/Sanskrit, but for Japanese, the long vowel sounds are used too (ā, ī, ō, ū):

  • á – 225, the a with an acute mark
  • é – 233, the e with an acute mark
  • ñ – 241, the n with a tilde over it
  • ú – 250, the u with an acute mark
  • ā – 257, the long “ah” sound
  • ī – 299, the long “ee” sound
  • ō – 333, the long “oh” sound
  • ś – 347 (346 for upper case), the s with an acute mark
  • ū – 363, the long “oo” sound
  • ḍ – 7693, a “d” sound in Sanskrit
  • ḥ – 7717, a breathy “h” at the end
  • ḷ – 7735, the nasal “l” sound
  • ṁ – 7745, a soft “m” sound
  • ṃ – 7747, the “ng” sound
  • ṅ – 7749, another “ng” sound
  • ṇ – 7751, the soft “n” sound
  • ḍ – 7693, the nasal “d” sound
  • ṛ – 7771, the deep “r” sound in the back of the throat.
  • ṝ – 7773, a longer, deep “r” sound.
  • ṣ – 7779 (7778 for upper case), the emphatic “s” sound
  • ṭ – 7789, the nasal “t” sound

Try it out on your webpages and see if it works well for you. After a few times, it gets much easier to accurate represent Buddhist terms in English, and you can pass yourself off as a Buddhist scholar or something. 😉

For further reference, checkout this excellent reference:

http://www.nitartha.org/diacritics_how_to.html

Namo Amida Butsu

Advertisements

Author: Doug

A fellow who dwells upon the Pale Blue Dot who spends his days obsessing over things like Buddhism, KPop music, foreign languages, BSD UNIX and science fiction.

4 thoughts on “Buddhism, HTML and diacritics”

  1. I study several languages, and a have masters from ucla in English, and I do not understand what you explained. I am a Buddhist in my philosophical thinking, and would like to know why I have to use diacritical marks to understand how to blog in a Buddhist subject.

    Like

  2. You don’t have to use anything really. However, I posted this as an easy reference for others. Many Buddhist terms use letters and sounds that are not usually found in the standard Roman Alphabet, but people have devised several different ways, for example, to render Pali terms.

    This page illustrates the challenges of printing the diacritics:

    http://www.accesstoinsight.org/lib/authors/bullitt/learningpali.html

    You can see from the page that people have derived many ways of doing this, and some are kind of awkward. Using Unicode HTML codes (method 7) wasn’t well established years ago, but as Internet technology gets better, they’re less awkward to use than lots of lower/upper case letters.

    Like I said, you don’t have to use them, but people often do. I know I have to chase down the code numbers a lot, so I collected this info from a few places posted this for my own reference. Others I know use this too.

    Thanks!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s