Analysing 11 million Ghanaian names to maximise a hit song's reach
Introduction
Background
2010 was a big year in Ghana's history; we struck black gold, inspired many around the world to sing and dance like us, and kicked our way through to the World Cup quarter finals.
On that last part, it really was a big deal. And we could have gotten even further if not for a certain Uruguayan player whose actions—I'm certain—increased the country's cardiac mortality rate that year.
To top things off, the one chance we got to right his wrongs was blown when our star striker, Asamoah Gyan, sent the ball—and many electrocardiographs—off the goalpost.
This moment inspired much hatred and legit concern for the man's life (Ghanaians don't play when it comes to football). But what saved him was that months before this historic penalty miss, under the stage name 'Baby Jet', he released a hit song 'African Girls' in collaboration with his musician friend Castro[1]. The love people had for it offset some of the resentment many would have for him. And I'm not surprised, it's a cute song.
The Song
The hit made its rounds on every radio station and in all chop bars (the one in my neighbourhood couldn't stop playing it). In the song, the duo fawn over Ghanaian women, talking about how beautiful they are ("Africans girls dey be", "mo ahoɔfɛ kɛkɛ, na ɛseɛ y'akoma") and how crazy it makes them ("Cos you turn me on, you bring me joy, you make me kolo eh"). You can see the full lyrics here.
Anyway, the song popped up on my queue last week, and I don't know why, but this part (in bold) really jumped out to me:
Ghana mmaa, mo hoɔ fɛ, moadi first
No retreat, no surrender, mo
ahoɔfɛ kɛkɛ, na ɛseɛ y'akoma
Linda, Barbara, Monica, Jessica,
Pamela, Sarah, Gifty, na Diana
Mo nyinaa, come on, moyɛ ogboo
In the bridge, they list some common female Ghanaian names like Linda, Jessica, and others[2]. And I wondered: Why these names in particular? Sure, it's clear they were trying to rhyme with lines before (surrender, akoma), but that can't be all.
It only makes sense that they wanted to touch as many Ghana girls' hearts as they could. You know that feeling when you hear your name in a nice song? Feels like a shout-out[3].
So Castro and Baby Jet's goal must've been to reach the most Ghana girls with that one line, rhyme scheme and all. Then I asked: are the current eight (8) names really the right combo to reach the most Ghana girls? And if not, which combo might work better?
This, esteemed readers, is what I was fussing over at 1 am last Saturday. It didn't help that I happened to have a really large dataset of Ghanaian names (don't ask me how I got this). Time to go spelunking!
Diving into the data
Data Prep
We start with a cleaned[4] 100 MB txt file of given names (one per line) for about 11 million Ghanaians[5].
13254809 mark
13254810 emmanuel
13254811 charlotte
13254812 stephen
13254813 patricia
13254814 esi
13254815 azumi
13254816 chris
13254817 yaw
13254818 kwaku
nl -ba given_names.txt | tail -n 10
The list definitely includes duplicates, so let's create a frequency table: a mapping of each given name and how many times it shows up in the dataset.
from collections import Counter
with open("given_names.txt") as f:
given_names = f.read().splitlines()
frequency_table = Counter(given_names)
print(f"Total given names: {len(given_names):,}")
print(f"Unique given names: {len(frequency_table):,}")
print(f"Duplicates: {len(given_names) - len(frequency_table):,}")
# Output
Total given names: 13,254,818
Unique given names: 625,687
Duplicates: 12,629,131
Great! We have approximately 625,687 unique Ghanaian given names, and a good estimate of how many people share each name.
Calculate total reach of the names used in the current song
Now, let's calculate the song's reach: how many Ghana girls could've heard their names. The lyric mentions eight. To calculate reach, we simply sum the associated frequencies for each of them[6].
| linda | barbara | monica | jessica | pamela | sarah | gifty | diana | REACH |
|---|---|---|---|---|---|---|---|---|
| 35,433 | 3,903 | 27,388 | 4,932 | 1,348 | 48,125 | 61,891 | 39,326 | 222,346 |
Woah! That's more than two hundred thoozin Ghana girls who would've felt that shout-out, including one of my favourite aunties who still smiles when the song comes on.
But again, to the original question, can we do better? Can we help Castro and Baby Jet's song reach many more Ghana girls with a different set of names? Let's see.
Can we do better?
We want more Ghana girls to feel validated, but there are some constraints. First, we only have 8 slots available in that line. And second, we have to maintain the rhyme scheme.
This sounds like a constrained selection problem: pick 8 names that maximise reach while satisfying a fuzzy constraint (the rhyme scheme).

There are multiple ways to do this, but we don't need anything fancy like dynamic programming or whatever. We can greedily pick the name that gets us the most reach for each slot.
First, get the top-n most common names,
We do this with the code below:
for i, (name, freq) in enumerate(frequency_table.most_common(20)):
print(f"{name:<20} {freq:>8,}")
# Output
Name Frequency
———————————————
emmanuel 194,104
mary 162,935
kofi 135,028
samuel 133,294
isaac 119,167
kwame 114,393
comfort 107,159
elizabeth 99,714
grace 94,430
joseph 93,505
francis 90,152
john 88,803
daniel 88,473
yaw 85,217
esther 85,004
kwaku 83,272
stephen 77,948
ama 76,254
janet 74,061
mohammed 73,532
I sent this to my Ghanaian friends and family, and everyone agreed. Pick any random person on the street, ask if they know an 'Emmanuel', and they'll point to someone nearby. Many people can relate to having an uncle named 'Kofi' or an auntie named 'Comfort'. I'm curious if 'Elizabeth' comes from the late British queen. All in all, the list tracks: Biblical names (Ghana is majority Christian), Muslim names ('Mohammed'), Akan day names ('Kwaku, Kofi, Yaw'), and old English names (from British influence).
Great! This list makes sense as names with the most reach. But there's a problem—most of these are traditionally male names. Unfortunately, the raw dataset is just of given names; we don't have info on the sex of the people with those names.
then exclude male names,
625,000 names is too many to manually filter. But here's the thing: the names in the dataset follow a power law (specifically, Zipf's law). A handful of names account for most people, while the long tail contains hundreds of thousands of rarer names.

So we need the lion's share and must not concern ourselves with the long tail[7]. We can start with the top 20 names, and manually exclude the male names until the entire list is of female names (it took 3 iterations). The result:
TOP_MALE_NAMES = []
frequency_table_without_top_male_names = Counter(
{
name: freq
for name, freq in frequency_table.items()
if name.lower() not in TOP_MALE_NAMES
}
)
print("Name Frequency")
print("-" * 30)
for i, (name, freq) in enumerate(
frequency_table_without_top_male_names.most_common(20)
):
print(f"{name:<20} {freq:>8,}")
# Output
Name Frequency
———————————————
mary 162,935
comfort 107,159
elizabeth 99,714
grace 94,430
esther 85,004
ama 76,254
janet 74,061
abena 66,960
gifty 61,891
yaa 58,556
agnes 57,823
faustina 56,901
akua 54,881
victoria 53,254
joyce 52,733
vida 51,999
gladys 51,766
rebecca 51,424
abigail 51,195
patience 50,682
to get the Ghana girls' names with the most reach,
We're getting closer! Let's prepare the list of names to whisper into Castro and Baby Jet's heads as they were writing the song lyrics. If we're naively selecting, we just pick the top 8 above:
| before: | linda | barbara | monica | jessica | pamela | sarah | gifty | diana | REACH |
|---|---|---|---|---|---|---|---|---|---|
| 35,433 | 3,903 | 27,388 | 4,932 | 1,348 | 48,125 | 61,891 | 39,326 | 222,346 | |
| after: | mary | comfort | elizabeth | grace | esther | ama | janet | abena | REACH |
| 162,935 | 107,159 | 99,714 | 94,430 | 85,004 | 76,254 | 74,061 | 66,960 | 766,517 |
Boom! We could triple the reach! That's 544,171 more Ghana girls feeling called. We did it! We've helped Castro and Baby Jet optimise their song!
Or...did we?
Here, listen to the original:
Now, try singing these names to the melody:
🎵 Mary, Comfort, Elizabeth, Grace, Esther, Ama, Janet, Abena 🎵
Yeah, you see the problem. Sounds terrible. These names break the lyrical flow completely!
while keeping the rhyme.
So yes, we could get a better name combo, but rhyming matters (duh). This part is more heuristic and fuzzy, more art than pure optimisation. In the original, Castro and Baby Jet use names that end with the 'ah' (ä) sound when pronounced in Ghanaian English[8] (like 'Sarah', 'Diana', 'Monica'), with 'Gifty' thrown in for breathing room.
Our first selection fails because of names like Janet, Grace (sorry, Auntie), Comfort, and Elizabeth (sorry, your majesty, RIP). We need to redo the selection, but this time, let's pick only candidates with that sound: seven names that end in [-a, -ah, -er] (which all sound like ä in Ghanaian pronunciation), and one that ends in [-ty].
The dataset doesn't include pronunciations, so I'm relying on my understanding[9] of how Ghanaians actually say these names. Let's write a simple filter for names ending in 'a', 'ah', 'er', or 'ty' (but not double-a 'aa' endings like 'Naa', which sound very different from what we want).
import re
frequency_table_with_rhyme_without_top_male_names = Counter(
{
name: freq
for name, freq in frequency_table_without_top_male_names.items()
if re.search(r"[^a](a|ah|er|ty)$", name)
}
)
print("Name Frequency")
print("-" * 30)
for i, (name, freq) in enumerate(
frequency_table_with_rhyme_without_top_male_names.most_common(25)
):
print(f"{name:<20} {freq:>8,}")
r"[^a](a|ah|er|ty)$" to
filter for top 25 names that rhyme better
The result:
Name Frequency
———————————————
esther 85,004
ama 76,254
abena 66,960
gifty 61,891
faustina 56,901
akua 54,881
victoria 53,254
vida 51,999
rebecca 51,424
felicia 49,217
akosua 48,748
sarah 48,125
adwoa 47,645
lydia 45,224
cecilia 43,401
diana 39,326
hannah 39,265
christiana 38,697
georgina 37,910
afia 36,889
rita 36,658
linda 35,433
juliana 33,647
amina 32,806
regina 32,805
Ahaan, that's more like it. Looking at the list, Castro and Baby Jet definitely slapped when they picked 'Gifty'—it's the most common name ending in 'ty' (3x more than 'Charity', the next most common -ty name). So 'Gifty' is locked in.
Now for the seven 'ah' names: this is where it gets fun.
The thing is, almost any combo from these 25 candidates will beat the original's 222k reach. The data constraint is solved.
But which combo actually sounds best? Which has the right rhythm when you sing it? That's subjective and more artistry, so let's make it democratic.
That's where you 🫵🏿 come in. I've built an interactive tool where you can pick your own combo, see how many Ghana girls it might reach, and vote on combinations others have made.
Alright, Your Turn!
Below are the top 25 names that fit the rhyme scheme. 'Gifty' is already locked in (they nailed that one!). Pick 7 other names that would make the best remix, whether you're chasing maximum reach, perfect flow, or just vibes.
Conclusion
So there you have it! With some data analysis, we've shown that Castro and Baby Jet could've doubled their reach while keeping the rhyming scheme. The math checks out. Whether they'd trade 'Barbara' for 'Felicia' or 'Diana' for 'Abena' is another question entirely, and up to you to decide!
Vote for your favourite remix, or create your own. Who knows—maybe someone will actually record it 😆 And if they do, just remember: you heard it here first. 🎵
P.S. If you're doing actual research into Ghanaian onomastics or name patterns[10], I'd genuinely love to chat.