Linguistic issues and the Arabic Wikipedia

From Devwiki

Jump to: navigation, search

Aude

Arabic is the fifth most widely spoken language by number of native speakers (according to Wikipedia), however the Arabic Wikipedia has lagged behind in growth, with the level of participation remaining highly disproportionate to the number of speakers. As one measure, there are 1,521 admins on the English Wikipedia, 301 admins on the German Wikipedia, while the Arabic Wikipedia has only 14. Wikipedia is not the only place in cyberspace to see such disproportion for English versus Arabic and other languages, but rather this is a general trend across the Internet. There are a number of possible factors for this disparity, including technical reasons in the early days of the Internet as well as social, demographic, and linguistic factors. This presentation gives an overview of these factors, ranging from the limited extent of internet usage in Arab countries, demographics factors, to linguistic reasons relating to variations in Arabic, across dialects and compared to the standard version ("Modern Standard Arabic") which is used on the Arabic Wikipedia.

The Arabic Wikipedia has adopted the standard form of Arabic, which is used in newspapers and other media, and is mutually comprehended among Arabic speakers from different countries and places. But, people do not learn MSA as a native language, but only through schooling, and a substantial portion of the population remains illiterate. Those who do know MSA and especially those using the Internet, likely know English well and are accustomed to using English on the Internet. The lack of good Arabic language content, as well as a good Arabic language search engine, has been cited as another factor for keeping the levels of internet usage low. The culture of "user generated" content is also not as prevalant as other parts of the world.

Wikipedia has potential to serve a needed purpose of providing quality content in Arabic, though substantial challenges exist for us to get further along towards that end. The potential to draw contributors from Arab countries is limited to the 8.5% who use the Internet, though a good portion of these users rather communicate in English. Some of these are also likely to choose to edit the English Wikipedia instead of the Arabic version, and the scope of the content on the English Wikipedia makes it much more of a useful resource for readers. It is also possible that a large portion of editors on the Arabic Wikipedia actually do not live in the Middle East, but maybe are living in Europe, Canada, the United States, or elsewhere, having relocated there. A "Wikimedia Census" is something that could help get at the question of who is editing the Arabic Wikipedia.

After giving an overview of these issues and challenges that face Arabic Wikipedia, there should be a good amount of time for audience discussion.

Arabic is the fifth most widely spoken language by number of native speakers (according to Wikipedia), however the Arabic Wikipedia has lagged behind in growth, with the level of participation remaining highly disproportionate to the number of speakers. As one measure, there are 1,521 admins on the English Wikipedia, 301 admins on the German Wikipedia, while the Arabic Wikipedia has only 14. Wikipedia is not the only place in cyberspace to see such disproportion for English versus Arabic and other languages, but rather this is a general trend across the Internet. There are a number of possible factors for this disparity, including technical reasons in the early days of the Internet as well as social, demographic, and linguistic factors. This presentation gives an overview of these factors, ranging from the limited extent of internet usage in Arab countries, demographics factors, to linguistic reasons relating to variations in Arabic, across dialects and compared to the standard version ("Modern Standard Arabic") which is used on the Arabic Wikipedia.

The Arabic Wikipedia has adopted the standard form of Arabic, which is used in newspapers and other media, and is mutually comprehended among Arabic speakers from different countries and places. But, people do not learn MSA as a native language, but only through schooling, and a substantial portion of the population remains illiterate. Those who do know MSA and especially those using the Internet, likely know English well and are accustomed to using English on the Internet. Overall, the extent of internet usage among Arab populations remains very low, estimated to be 8.5% of the population in 2006. The lack of good Arabic language content, as well as a good Arabic language search engine, has been cited as another factor for keeping the levels of internet usage low. The culture of "user generated" content is also not as prevalent as other parts of the world.

Wikipedia has potential to serve a needed purpose of providing quality content in Arabic, though substantial challenges exist for us to get further along towards that end. The potential to draw contributors from Arab countries is limited to those who use the Internet, though a good portion of these users would rather communicate in English. Some of these are also likely to choose to edit the English Wikipedia instead of the Arabic version, and the scope of the content on the English Wikipedia makes it much more of a useful resource for readers. It is also possible that a large portion of editors on the Arabic Wikipedia actually do not live in the Middle East, but maybe are living in Europe, Canada, the United States, or elsewhere, having relocated there. A "Wikimedia Census" is something that could help get at the question of who is editing the Arabic Wikipedia.

Background

The English Wikipedia has 6,663,710 registered users, however only a small fraction of are actually editors. As of September 2006, there were approximately 43,000 users who made at least five edits, and approximately 4,330 users deemed "very active", with over 100 edits that month.[1] As of March 2008, the English Wikipedia has 1,521 admins.[2]

According to Wikipedia, Arabic is the fifth most widely spoken language (by number of native speakers). In contrast to the English Wikipedia, there are 115,193 registered users, and only 14 admins on the Arabic Wikipedia. The number of active Wikipedians on the Arabic Wikipedia is approximately 400, which means the number who have made at least five edits in the past month. When considered in proportion to the number of speakers makes the Arabic Wikipedia, the number of users is disproportionately low — behind Vietnamese, Persian/Farsi, Ukranian, Turkish, and 27 other languages in representation. Indian languages, including Hindi, Teluga, and Marathi, are also very disproportionate in number of editors, in relation to the number of speakers.

On the Internet, a large majority of websites are in English. One of the first studies published on this found in 1997 that 81% of international websites were in English.[3] By 2000, this number had decreased to 68%, which is still overwhelmingly disproportionate to the number of speakers,[4] and remains very disproportionate. Thus, Wikipedia is not the only place in cyberspace to see such disproportion for English versus Arabic and other languages. The amount of Arabic content on the Internet is also highly disproportionate, with an estimated 100 million sites, compared to over 12 billion overall. A good Arabic search engine also has been lacking.[5]

Possible reasons for the dominance in English online may have to do with how the Internet originated. In the early years, the computing standards (e.g. ASCII) could not accommodate languages with other scripts (e.g. Arabic, Chinese, ...). Also, software developers and web developers/designers first came from the U.S., and they used English. These reasons have subsided, however use of English on the Internet is still dominant as an "international language" or the "de facto global language". With the Internet enabling global communications, there is a need for a global language. People may use their native languages for communication via the Internet with other locals or people within their country/region, and for informal communications (e.g. personal notes). However, English is often regarded as "more professional" and the international standard.[6]

Within the Arab world, the percentage of the population using the Internet is still very small (2006 estimates say 8.5%). The United Arab Emirates, Bahrain, and Qatar have the highest penetration of Internet use among Arab countries.[7] One reason for the disproportion is the fact that significant portions of the population in Arab countries don't speak English or are not that comfortable with the language.[8]

Wikipedia in other languages

Arabic, as well as Indian languages (e.g. Hindi, Teluga, and Marathi), are highly disproportionately underrepresented on Wikipedia. An important reason for this is likely that in places such as Egypt, English is widely taught as a second language (especially so among the better educated and "elite"). Likewise in India, English is widely spoken as a second language. Also, both in Indian and the Arab-speaking regions, a large number of dialects and regional variations exist.

The Arabic version of Wikipedia (as with newspapers) has adopted Modern Standard Arabic (MSA), which is a formal, written form of the language that is mutually comprehended among Arabic speakers from different countries and places. However, MSA is not what people learn to speak natively and not what is used in day-to-day communications on the streets, among friends, etc. People use the colloquial or local dialects. The use of different variations of Arabic, one for "high" purposes such as speeches, legal proceedings, and other formal situations, and "low" purposes in everyday life, is known as diglossia. In online communications, people in Egypt and other Arabic-speaking countries often end up communicating in English, even among other Arabic speakers. English is especially common in use for formal communications via the Internet. Learning English in Egypt is also deemed important for people working in the tourism industry, as well as business and professional careers.

There is another demographic in Egypt and other Arab countries -- a large portion of the population is illiterate in written, standard Arabic, yet alone knows English.[9] 2001 estimates say that 38.5% of the population in Arab countries are illiterate, though the rate has improved. The illiteracy rate ranges from 10.2% in Jordon, 59.8% in Mauritania, while rates in other countries are also high: Yemen (53.6%), Morocco (51.2%), Egypt (44.7%), Sudan (42.3%), and Algeria (33.3%).[10]

Internet usage in Arab countries

The percentage of Internet users among the population in most Arab countries is small, much smaller than places including the United States, Scandinavia, and elsewhere. This could be due in part to socioeconomics, with a significant portion of the population illiterate or unable to read standard Arabic proficiently, yet alone be able to understand English which is widely used on the Internet. In terms of dialects, there is also variation between rural and urban areas within the same country.

"Kamli believes the lack of an Arabic Internet search engine so far has been one of the main inhibitors of growth in Internet usage. With approximately 65% of Arab users not comfortable with the English language and only 35% that speak it, many native Arab speakers refrain from using the internet because they are not confident using English."[5]

Also, the controls and monitoring of the Internet by some governments may discourage Internet use, but that is likely a small factor compared to the others. Noted that in the United Arab Emirates, which more liberally blocks websites, they also have blocked online translation sites (e.g. Google Translator), which may make Internet use more difficult for people with limited English skills.[11]

Wikipedia can serve a useful purpose of providing quality content in Arabic, but challenges to this remain due to linguistic factors, access to the internet, illiteracy rates, and preferences for younger people (especially those well educated and most likely to be using the Internet) in Arab countries to communicate in English. Regarding Wikipedia, it is possible that a significant portion of editors on the Arabic Wikipedia actually do not live in the Middle East, but maybe are living in Europe, Canada, the United States, or elsewhere, having relocated there. A "Wikimedia Census" has been discussed within the past months on the Foundation-l mailing list. This is something that could help get at the question of who is editing the Arabic Wikipedia.

Personal tools