For those of who have not heard, Oyu Tolgoi is a joint venture mining company between Rio Tinto, Turquoise Hill Resources (formerly Ivanhoe Mines), and the Mongolian Government to develop the Oyu Tolgoi copper and gold mine located in Umnugovi province in the Gobi Desert. It’s a huge deal in Mongolia because it is the “world’s largest undeveloped copper-gold project”1 and a political minefield par excellence. Oyu Tolgoi has a very active public relations machine, and I recently decided to conduct a simple analysis of the Oyu Tolgoi website to see the kinds of words the company uses to describe itself and its activities.

Before going further, though, here’s an additional bit of context to put the Oyu Tolgoi project in perspective. According to Turquoise Hill Resources, the total capital investment in the project at the end of the second quarter of 2012 was just over $5 billion USD.2 Mongolia’s GDP was estimated by the World Bank to be approximately $8.5 billion USD in 2011. Putting that in terms of investment in the US, that would be like a company plowing $9 trillion USD into the US economy. Anyway you slice it, it is a huge deal, and one can imagine that Oyu Tolgoi’s communications and media departments are never without something to do.

The analysis I conducted involved crawling the Oyu Tolgoi website (www.ot.mn) and downloading unique textual content from its web pages.3 In total I downloaded 522 pages written in Mongolian and 520 pages written in English comprising 345,034 and 359,278 words, respectively. The objective was to rank the word frequencies in each language to see what the top 100 words used on the site in each language were. It was an exploratory analysis, and I did not have any prior assumptions or hypotheses about what I might find.

The images below are frequency clouds derived from the rank listing of the top 100 words used in each language.4 The sizes of the words are proportional to their frequencies with bigger words more frequently used than smaller words.

Frequency cloud in English

Not surprisingly “Oyu” and “Tolgoi” take the top ranking in both languages. My wife pointed out as she walked by looking over my shoulder while I was examining the images that the words “copper” and “gold” do not appear in the clouds (I would have noticed that on my own, eventually, my inflated sense of pride compels me to think). Indeed, they don’t. Copper is the 172nd most frequently used and gold the 177th most frequently used words in English, and the word зэсалтны (copper-gold) appears first in the list in Mongolian with a rank of 203rd.5 One could be forgiven just by looking at the clouds in guessing that Oyu Tolgoi produces more “scholarships” than it does copper. I don’t think one should read too much into it, but it is an interesting result nonetheless.

Frequency cloud in Mongolian

The frequency clouds provide a fun way to visualize the words Oyu Tolgoi is using in its public relations efforts. There are probably other interesting patterns to be discovered if one stares at the images long enough.

This analysis was part of much larger effort on my part to develop methods to conduct textual analysis on a large scale with Mongolian content, and I hope in the future to provide other interesting results from those efforts.

Footnotes
1. Turquoise Hill Resources, see http://www.ivanhoemines.com/s/Oyu_Tolgoi.asp?ReportID=379189.
2. Ibid.
3. I used the Python based crawler framework “Scrapy” for this task. All content was stripped of HTML, punctuation, and numeric characters for the analysis.
4. Uninformative “stop words” were removed from the list to create the clouds. In English pages, these were: and, the, of, in, to, a, for, is, are, with, on, will, be, at, or, as, by, an, from, facebook, twitter, search, skip, copyright, main, and site. In Mongolian pages, these were: нь, вэ, байна, ба, буй, to, юм, тухайоюу, search, skip, тухай, дээр, facebook, twitter, oyu, tolgoi, copyright, main, site, болно, бөгөөд, and болон.
5. There were 8,887 unique words in the English content and 12,551 unique words in the Mongolian content.