Getting to know Sogdian

By Adam Benkato

(Originally posted in September 2014)

This is a very brief overview giving directions for those interested in studying or becoming familiar with the Sogdian language and its source materials. I’ll try my best to provide as many links and references as possible, but since this should be considered more as a collection of signposts than as a detailed, technical introduction, those interested should follow the references to get more depth. One caveat: if anyone wants to pursue Sogdian seriously at all, a reading knowledge of German is pretty important, and French wouldn’t be bad either.

It has been a little over one hundred years since the discovery of the first major Sogdian texts by the Turfan expeditions of the early 1900s. Since then further Sogdian texts have been discovered at more or less regular intervals, and since excavations are ongoing in parts of former Sogdiana and neighboring regions, it is likely that new texts will continue to turn up. This is hopefully a good thing for my future employment prospects. The story of the discovery and decipherment of Sogdian is fascinating, but its retelling will have to remain for another day.

The following sections attempt to provide a few overviews. First will be the scripts used to write Sogdian, and then will come a discussion of the major collections which preserve Sogdian manuscripts (‘manuscripts’ and ‘fragments’, by the way, are essentially interchangeable, since nearly all Sogdian manuscripts are not complete but in fragments!). Then will be an  overview of Sogdian texts in an approximate chronological order, giving information about their find-sites and textual characteristics, indicating the collection(s) in which they are presently preserved, and pointing to selected relevant publications.


a) the ‘Sogdian’ (sometimes called ‘national’) script, based on the Imperial Aramaic alphabet, the origins of which lie in the days of Sogdiana as an Achaemenid province:

SO 14000
SO 14000

b) the ‘formal’ (sometimes called ‘Sūtra’) script, a variant of the Sogdian script:

SO 18249
SO 18249

c) the ‘Syriac’ (namely Estrangelo) script:

N489, being E27 (formerly C2)

d) the ‘Manichaean’ script, which was developed by Manichaeans to write their texts in any language:


e) a few Sogdian texts are also written in Brahmi, and some are transcribed into Chinese characters:


There are unfortunately not really any textbooks which focus specifically on teaching students to read the different scripts and understand their unique writing conventions, and lay out paleographical analyses. For a) and b), for the time being one must compare a trustworthy edited text with a digital image and a table of letters, and simply teach oneself. We hope to address this need soon. For c), any guide to the Syriac alphabet will do, but one should be aware of Sogdian writing conventions. For d) the useful online primer of P.O. Skjærvø is the best bet, and one can practice identifying the letters via this Memrise excercise.


Sogdian texts are held in a number of different collections around the world, each of which uses their own numbering systems and publishes their own catalogues. The major obstacle encountered, therefore, is due to the fact that different expeditions (especially to Turfan) from different countries often collected fragments belonging to a single manuscript. Those fragments were subsequently dispersed around the world, assigned find-signatures or shelf-numbers according to each collection’s unique system, and catalogued independently of one another. This means that scholars often have to reconstruct texts by combining manuscript fragments held in disparate collections – not always an easy task! (For the image see Reck,  2009, ‘Soghdische manichäische Parabeln in soghdischer Schrift’)

From Reck (2009)
An example of fragments combined from different collections, from Reck (2009)

The major collections of Sogdian texts are in:

BERLIN: The Turfanforschung of the the Berlin-Brandenburg Academy of Sciences, whose online catalogue Digitales Turfan-Archiv contains digital images of pretty much every fragment in their possession. It is the largest collection of texts from the oasis of Turfan. To look at Sogdian fragments, simply browse to M (for Manichaean script), n (for ‘Nestorian’ script), So-Ch/So (for Sogdian script), or even Ch/U (some Sogdian script fragments previously thought to be Uighur). But how will you know which fragments you want to read, or what has been published on a given fragment? For that you’ll need a catalogue. Pick up Reck (2006) for Manichaean (in content) texts written in Sogdian script, Sims-Williams (2012) for Christian texts in Syriac script, and Boyce (1960) for texts in Manichaean script (she includes all, not just Sogdian texts, in M script, so refer to her topic index). Morano is preparing a more detailed catalogue of Sogdian texts in Manichaean script, but for now only his ‘work-in-progress’ is available (Morano, ‘A Working Catalogue of the Berlin Sogdian Fragments in Manichaean Script’). See also Reck’s catalogue of Buddhist fragments in Sogdian/formal scripts (2016) and of Christian/magic/medical/miscellaneous fragments in Sogdian script (2018). These catalogues are extremely useful since they definitively replace (though concord with) the old, extremely ambiguous and confusing, find-signature system that was previously in use.

LONDON: The Stein collection of the British Library holds texts from Dunhuang, as well as Turfan and some other Central Asian sites. There exists no catalogue as such for the Sogdian texts, but they are comparatively few in number and have mostly all been published with good descriptions in Reichelt (1928–1931) and Sims-Williams (1976), with MacKenzie (1970) and (1976) being updated re-editions of certain texts. Further re-editions of the ‘Ancient Letters’ will be discussed below. These texts are usually referred to by name rather than shelf or catalogue number.

ST. PETERSBURG: The Russian Academy of Sciences possesses two major collections of Sogdian texts, the Mugh documents, and fragments from Turfan. The collections have presumably been digitized as part of the IDP agreement, but are not yet available online. Ragoza (1980) is a catalogue and edition of the Sogdian Turfan fragments in Russian, but must be used with caution and supplemented with later re-editions. These fragments are referred to by L-number (for Leningrad).

PARIS: The Pelliot collection in the Bibliothèque Nationale holds about 30 Sogdian texts from Dunhuang (often referred to by P-number), and a handful of Sogdian-Uighur texts also from Dunhuang. There is no separate catalogue, but a description of all the Sogdian fragments is included in the edition of Benveniste (1940), and Sogdian-Uighur texts in Sims-Williams–Hamilton (1990) and (2015).

KYOTO: The Ōtani collection houses a number of Sogdian fragments, most quite small, which were catalogued and edited by Kudara, Yoshida & Sundermann (1997), in Japanese. These fragments are referred with an O plus catalogue-number (e.g. O7543).

Finally, there are some smaller holdings of Sogdian texts, none of which exceed more than a few fragments each. These include the Mannerheim collection in Finland (see e.g. Sims-Williams & Halén 1980).

Major epigraphy

Here, we will look at what I call the major Sogdian primary sources. There isn’t really any strict definition of ‘major’; in fact, most discoveries of texts have been pretty major, involving either large numbers of manuscripts/fragments or lengthy and historically important documents. Most such texts will be discussed in this part, but there is a number of smaller texts (such as coin legends, ostraca, small single inscriptions, etc.) that we’ll leave to a third part.

The following overview proceeds chronologically. Since dates of texts are not always known or certain, I’ve simply drawn from what estimations are provided in the scholarly literature – so dates and their order are not to be taken as fact, but are more to help give a rough idea. Furthermore, the references provided are not exhaustive. A bibliography which I’ve compiled contains more, but if you really want to learn about a particular text or group of texts in-depth, you’ll have to follow the references starting from those selected here, all of which can be located in the bibliography.

Kultobe Inscriptions

The Kultobe inscriptions are a group of dedicatory inscriptions on the foundation stones (and similar) of a fortress in Kultobe, in present-day southern Kazakhstan; they are now preserved in the Central State Museum of Kazakhstan. It would have been in the north of Sogdiana; the inscriptions seem to allude to military operations along the border of Sogdiana with more northern nomads. They are first edited in Sims-Williams & Grenet (2006), publ. of nos. 1–8, and Sims-Williams, Grenet & Podushkin (2007), publ. of nos. 9-11. No dates are found in the extant inscriptions, but since certain features of script and orthography are somewhat more archaic than the next-oldest Sogdian documents, these inscriptions are thought to be significantly older.

Ancient Letters

The group of documents known as the ‘Ancient Letters’ (sometimes ‘Alten Briefe’) are so called because they were the oldest Sogdian texts known until the discovery of the Kultobe inscriptions, and are still the oldest ones of significant length. They are letters from Sogdian merchants away from the homeland to contacts back home in Samarkand that were found, still unopened and sealed in a postal bag, by Aurel Stein in a watchtower between Dunhuang and Lou-lan on the route to Samarkand. We can thus surmise that they never reached their destinations, and thus that their recipients may never have been informed of the dramatic events described in the letters. The date of 312–313 CE is generally accepted for the composition of the letters (see Henning 1948 and Grenet & Sims-Williams 1987). There are five well-preserved, nearly-complete letters, a moderately-preserved fragment of a sixth letter, and smaller fragments of perhaps two more letters. The letters are written in the Sogdian script, although in an archaic form that is less cursive than that used in the Mugh documents and represents a stage closer to the Aramaic script. The Ancient Letters also make use of a number of heterograms not used in later writing in Sogdian script (see here) and contain a number of Indian words for commercial terms, indicating perhaps close relations between Sogdian and Indian merchants. The letters were not really deciphered until the work of Hans Reichelt, whose editio princeps was published in 1931. A modern edition of the letters as a whole is lacking, but letters one (Sims-Williams 2005), two (Sims-Williams 2001), three (Livshits 2008), and five (Sims-Williams et al. 1998) have been recently re-edited.

Upper Indus graffiti

The next major epigraphical corpus is that of Shatial, located near Gilgit in what is now northern Pakistan. The site consists of nearly 600 short inscriptions in Sogdian (together with a handful in Bactrian, Middle Persian, and Parthian) made by merchants visiting the location over a particular trade route. The inscriptions for the most part consist of only a personal name, sometimes with patronymic or other identifier, and only a few consist of a longer phrase; the purpose of all, it seems, was to commemorate the named individual’s visit. The script and certain archaisms of the language bear similarities to the Ancient Letters (see Sims-Williams 2000: 533–4), and so the inscriptions probably date from a period ranging from the 3ʳᵈ century CE to the 7th century. The seminal edition of the inscriptions, along with a discussion and photographs is Sims-Williams 1989/1992.

Mugh Documents

Mount Mugh is the site of a castle east of Panjikent (present-day Tajikistan), overlooking the Zerafshan and Qom rivers, which was a refuge of the last rulers of Panjikent fleeing the conquering Muslim armies in the early 8th century. In 1932 and 1933, some Sogdian documents were discovered there, which turned out to be part of the archives of Ðēwaštīč (r. 706–722), one of the final Sogdian pre-Islamic rulers. The materials from Mugh (including 74 Sogdian documents, an Arabic one, and a handful in Turkish and Chinese) are now in the Oriental Institute of St. Petersburg. The entire Sogdian corpus was initially edited by a team of Russian scholars, and recently re-edited by Livshits (2008), taking into account a number of articles dealing with specific aspects of the Mugh texts; the latter work will soon appear in English translation (Livshits fthc.). Many individual linguistic and historical articles are also devoted to aspects of the Mugh texts. The documents are all written in a variety of the Sogdian script that is more developed than that of the Ancient Letters but also differs from that of the later Manichaean texts.

Buddhist Texts

Buddhist texts in Sogdian come from both Turfan and Dunhuang, and were produced probably in the 7th and 8th centuries CE. These texts are typically written in the variety of the Sogdian script known as the ‘formal’ script (sometimes called ‘sutra’ script), but are also commonly found in other varieties of the Sogdian script. In nearly all cases have the texts been identified, and are translations from identifiable Chinese originals, though it is sometimes the case that an exact parallel is no longer extant in Chinese. The Sogdian transcriptions of Sanskrit and Chinese names and words are extremely fruitful material for the study not only of Sogdian, but also of the Sanskrit (or Prakrit) and Chinese pronunciation of that era (see especially Provasi 2013). A helpful guide to the Sogdian Buddhist lexicon was prepared by MacKenzie (1971); for a survey of scholarship until the mid-1970s see Utz 1978; for a study of the Chinese-to-Sogdian translation method, see Yoshida (2013). The texts held in Paris were all edited by Benveniste (1940, 1946), while MacKenzie was responsible for the updated publication of those held in London (1970, 1976). A large number of Buddhist fragments from Turfan are held in Berlin and have been edited in articles by especially Sundermann, Kudara, Yoshida, and Reck (whose catalogue of Buddhist fragments in Sogdian script is forthcoming).

Christian Texts

Nearly all the Christian texts (some 600 in number) in Sogdian come from a ruined monastery called Shüi-pang in Bulayïq, a site in the northern part of the Turfan oasis; a few come from other sites within Turfan. All seem to be translations from Syriac originals, and are chiefly written in a variety of the Syriac script which has been adapted for Sogdian (about 550), though some fragments are in the Sogdian script (about 50). These texts are generally dated between the 9th and 13th centuries CE. The best overview of the Christian literature in Sogdian is to be found in Sims-Williams (2009), while Dickens (2009) provides a useful overview of Christian texts from Turfan in general. The recent catalogue of Sims-Williams is where one should look for descriptions of texts in Syriac script, while Reck (2008) is a useful overview of those in Sogdian script. Regarding major editions of texts, first and foremost is Sims-Williams’ (1985) edition of the C2 (now E27) codex, which along with GMS is probably the most frequently cited work on Sogdian and an absolutely necessary reference for the study of any variety of Sogdian. Two other major editions have recently been published (see Sims-Williams 2014 and fthc.).


The Karabalgasun inscription is a commemorative trilingual (Sogdian, Chinese, Uighur) inscription on a large granite boulder in present-day Mongolia, commissioned by the eighth khagan (r. 808-822) of the Uighur steppe empire, of which the official religion was Manichaeism from about 762/3 CE until its collapse in about 840 CE. Much value lies in the fact that many historical events are only known from this inscription and that it describes the adoption of Manichaeism by the Uighur khagans. Ironically, it was actually the first original Sogdian text to become known, though it was mistakenly thought to be Uighur from its discovery in the late 1800s (before the discovery of the Turfan texts) until F.W.K. Müller (1909) recognized it as Sogdian. A good overview can be found in Yoshida (2011), which gives references to intervening studies and new readings; Yoshida (1990) provides an English summary of his earlier Japanese re-edition of the entire inscription.

Manichaean Texts

The Manichaean texts in Middle Iranian were discovered in Turfan by a number of expeditions from different countries, and so the texts are preserved in Berlin, Kyoto, St. Petersburg, London, and China. The Sogdian part thereof consists of a wide range of texts, in both Manichaean and Sogdian scripts, from the various locales (such as Qočo, Toyoq, Bäzäklik) within Turfan. These were the first original Manichaean sources to become known, though in some cases are extremely fragmentary. They are datable to the 8th through 10th centuries CE. Publications thereon are far too numerous to list exhaustively, but major editions of texts are usually published in the Berliner Turfantexte series. Among those, the editions of Sundermann are of utmost importance (see 1981, 1985, 1990, 1992, 1997) , while he devoted many articles to smaller problems. Henning’s study (1936) of the largest Manichaean codex (mostly in WMIr. with parts in Sogdian) is still a standard work, and nearly all his publications continue to be indispensable. The handbook of Gershevitch (1954) is still the only general guide to aspects of grammar. Finally, there are numerous publications of Sims-Williams, the Turfanforschung group (Durkin-Meisterernst, Reck), and Morano, as well as nearly every other scholar of Sogdian since the early 1900s. Those who want an easily accessible translation of Manichaean texts with references to primary sources should consult Klimkeit (1993).


Documents from Dunhuang dating from the later part of the era of Sogdian as a written language, revealing a heavy Turkish (i.e. Uighur) influence on written Sogdian. There is also a degree of Chinese influence in the texts. For the latest edition of the relevant sources, see Sims-Williams & Hamilton (1990), and Yoshida (2009) for further grammatical remarks.