Thursday, June 6, 2013

Chris Eccel's Sembase Project

Dr. Arthur Chris Eccel's Sembase Project is a highly ambitious database effort undertaken over some decades now by one of my oldest friends. I haven't mentioned it before because it's unfinished and the material available online is still limited, but the Ancient World Online  website linked to it a few weeks ago, so even if it isn't quite ready for prime time, I thought I would link too. This is about an old friend, so I'm hardly an unbiased observer here.

Chris' Ph.D. is in sociology, and he's a former Professor at AUB, but his education is much more eclectic than that. He has quite a command of ancient and modern Middle Eastern languages. He gives his c.v. here; you'll note that he completed doctoral coursework for a doctorate in Near Eastern Languages at Harvard, specializing in Arabic and Hebrew ("Taught myself Geez and Aramaic during this period," he notes), before going on to a doctorate in sociology at Chicago. (The year in law school at UCLA doesn't count, but the undergraduate majors in Greek and Arabic and minors in Latin and Hebrew and at least two MAs probably do.) Back in the days before the Chicago Assyrian Dictionary was available online, and before it was complete, you could find a set on his shelves. And it would be dog-eared.

After his time at AUB Chris (fluent in several varieties of Arabic) served as Public Affairs Officer at several US embassies in the Arab world (Iraq, Saudi Arabia [Jidda consulate], Bahrain, Yemen, and Algeria among them). His last posting was Damascus.

To say that Chris is an interesting conversationalist is an understatement. People meeting him for the first time tend to either consider him utterly mad or a total genius. Like some madmen and all geniuses, he's a bit of both.

Now retired and living in Hawaii, he continues to pursue a hugely ambitious project he's dabbled with for decades: a database of all the roots for all the Semitic languages, for comparative purposes.

Chris explains the origins of his project:
In Cairo in the seventies and eighties, students, professors and researchers met in each other's apartments without notice, scouted out Egyptian bars prized for the fact that no tourists would ever be found there (including an interesting bouza dive), and broke the crust together at various restaurants, often the old Cafe Riche (the Filfila being then only a few tables in an alleyway). Sipping Stella beer, our conversation ranged from politics to our research, to the gossip of the academic expat community. It was then that I began calling to my friends' attention various Arabic roots that seemed related. The Stella encouraged some creativity in drawing these "relationships." At some point a friend asked if I was recording my observations. I was not. Another objected that with no controls, one can make anything into anything. This stimulus, and a bit more Stella, prompted the ultimate fantasy: what if there were a tool that would quickly search all essential lexical sources of all Semitic languages and display all information that might be relevant to evaluating a proposed relationship?
[Full disclosure: I was usually one of those friends at the table, including the "interesting bouza dive," which was behind the fire station in Midan ‘Ataba.] Also, that may not be everyone's definition of an "ultimate fantasy." He continues:
On a trip to Kharga Oasis, I took with me pages of typing paper cut into fourths and began recording "xliterals" (triliterals, biliterals, etc.) by assigning them to semantic categories. It occurred to me that although not all roots in a category would be related (far from it), the roots that are related should fall within the same semantic category. By recording only roots, and using sufficiently broad categories, I hoped to arrive at groups that would be amenable to human inspection, to facilitate the identification of root pairs or groups that might be related. Eventually, I went through several Arabic dictionaries, and arrived at about 34,000 records. Then I began doing the same for Hebrew. 
 So where is he now? As he notes:
Although the basic structure of the database is in place, it will need some editing. Since its inception, the 340 major semantic categories have been reduced to only around 135, and the subcategories are approaching 2,000. Even the fields for sorting are not final; I will modify the sort orders (alphabetical and phonological) when I have more languages in the table, to profit from what I have learned. At present, it has all of Biblical Hebrew and Geez. It has an aleph-through-ya' data set from several principal Arabic dictionaries, but Arabic will be comprehensively revisited when all other target languages are done. At that time, a protocol will be followed to glean from currently spoken dialects, with an emphasis on those in areas where a Semitic language was spoken at the time of the rise of Islam (the Arab conquests). The modern Semitic languages of Ethiopia will be done only as found useful. Note that the entries summarized below do not just represent the number of roots and words. Any given root or word can be entered into more than one semantic category, depending on its semantic extension.
For the detailed current status, see here. I'm pretty well-informed on languages (or so I tell myself), and while I was familiar with Harari and Mehri (well, I've at least heard of them), I had to look up Gurage. It's an Ethiopian language.

It is, to say the least, an ambitious project. I wish we could tap into his database online, but hopefully that day will come.


David Mack said...

Let a very amateur linguist contribute a weak medial triliteral verb that an Egyptian friend manufactured back in my Cairo Fulbright days (64-65). When I asked what we should do before dinner that evening, he replied khalina nitbayyir. Then we proceeded to drink several large bottles of Stella beer.

Michael Collins Dunn said...

David: I've let Chris know he should check out your comment.