Analysing 11 million Ghanaian names to maximise a hit song's reach

Introduction

Background

2010 was a big year in Ghana's history; we struck black gold, inspired many around the world to sing and dance like us, and kicked our way through to the World Cup quarter finals.

On that last part, it really was a big deal. And we could have gotten even further if not for a certain Uruguayan player whose actions—I'm certain—increased the country's cardiac mortality rate that year.

To top things off, the one chance we got to right his wrongs was blown when our star striker, Asamoah Gyan, sent the ball—and many electrocardiographs—off the goalpost.

Play Video: Asamoah Gyan penalty miss vs Uruguay

This moment inspired much hatred and legit concern for the man's life (Ghanaians don't play when it comes to football). But what saved him was that months before this historic penalty miss, under the stage name 'Baby Jet', he released a hit song 'African Girls' in collaboration with his musician friend Castro[1]. The love people had for it offset some of the resentment many would have for him. And I'm not surprised, it's a cute song.

The Song

The hit made its rounds on every radio station and in all chop bars (the one in my neighbourhood couldn't stop playing it). In the song, the duo fawn over Ghanaian women, talking about how beautiful they are ("Africans girls dey be", "mo ahoɔfɛ kɛkɛ, na ɛseɛ y'akoma") and how crazy it makes them ("Cos you turn me on, you bring me joy, you make me kolo eh"). You can see the full lyrics here.

Play Video: African Girls by Castro ft. Baby Jet

Anyway, the song popped up on my queue last week, and I don't know why, but this part (in bold) really jumped out to me:

Ghana mmaa, mo hoɔ fɛ, moadi first

No retreat, no surrender, mo

ahoɔfɛ kɛkɛ, na ɛseɛ y'akoma

Linda, Barbara, Monica, Jessica,

Pamela, Sarah, Gifty, na Diana

Mo nyinaa, come on, moyɛ ogboo

In the bridge, they list some common female Ghanaian names like Linda, Jessica, and others[2]. And I wondered: Why these names in particular? Sure, it's clear they were trying to rhyme with lines before (surrender, akoma), but that can't be all.

It only makes sense that they wanted to touch as many Ghana girls' hearts as they could. You know that feeling when you hear your name in a nice song? Feels like a shout-out[3].

So Castro and Baby Jet's goal must've been to reach the most Ghana girls with that one line, rhyme scheme and all. Then I asked: are the current eight (8) names really the right combo to reach the most Ghana girls? And if not, which combo might work better?

This, esteemed readers, is what I was fussing over at 1 am last Saturday. It didn't help that I happened to have a really large dataset of Ghanaian names (don't ask me how I got this). Time to go spelunking!

Diving into the data

Data Prep

We start with a cleaned[4] 100 MB txt file of given names (one per line) for about 11 million Ghanaians[5].

13254809        mark
13254810        emmanuel
13254811        charlotte
13254812        stephen
13254813        patricia
13254814        esi
13254815        azumi
13254816        chris
13254817        yaw
13254818        kwaku
The last 10 lines in the txt file: nl -ba given_names.txt | tail -n 10

The list definitely includes duplicates, so let's create a frequency table: a mapping of each given name and how many times it shows up in the dataset.

from collections import Counter

with open("given_names.txt") as f:
    given_names = f.read().splitlines()
    frequency_table = Counter(given_names)
    print(f"Total given names: {len(given_names):,}")
    print(f"Unique given names: {len(frequency_table):,}")
    print(f"Duplicates: {len(given_names) - len(frequency_table):,}")
# Output
Total given names: 13,254,818
Unique given names: 625,687
Duplicates: 12,629,131
Create a frequency table for the names and see some stats on the duplicates

Great! We have approximately 625,687 unique Ghanaian given names, and a good estimate of how many people share each name.

Calculate total reach of the names used in the current song

Now, let's calculate the song's reach: how many Ghana girls could've heard their names. The lyric mentions eight. To calculate reach, we simply sum the associated frequencies for each of them[6].

linda barbara monica jessica pamela sarah gifty diana REACH
35,433 3,903 27,388 4,932 1,348 48,125 61,891 39,326 222,346

Woah! That's more than two hundred thoozin Ghana girls who would've felt that shout-out, including one of my favourite aunties who still smiles when the song comes on.

But again, to the original question, can we do better? Can we help Castro and Baby Jet's song reach many more Ghana girls with a different set of names? Let's see.

Can we do better?

We want more Ghana girls to feel validated, but there are some constraints. First, we only have 8 slots available in that line. And second, we have to maintain the rhyme scheme.

This sounds like a constrained selection problem: pick 8 names that maximise reach while satisfying a fuzzy constraint (the rhyme scheme).

Equation for the constrained optimisation problem
Problem written in math form

There are multiple ways to do this, but we don't need anything fancy like dynamic programming or whatever. We can greedily pick the name that gets us the most reach for each slot.

First, get the top-n most common names,

We do this with the code below:

for i, (name, freq) in enumerate(frequency_table.most_common(20)):
    print(f"{name:<20} {freq:>8,}")
# Output
Name                Frequency
———————————————
emmanuel            194,104
mary                162,935
kofi                135,028
samuel              133,294
isaac               119,167
kwame               114,393
comfort             107,159
elizabeth            99,714
grace                94,430
joseph               93,505
francis              90,152
john                 88,803
daniel               88,473
yaw                  85,217
esther               85,004
kwaku                83,272
stephen              77,948
ama                  76,254
janet                74,061
mohammed             73,532
Top 20 most common names and their frequencies

I sent this to my Ghanaian friends and family, and everyone agreed. Pick any random person on the street, ask if they know an 'Emmanuel', and they'll point to someone nearby. Many people can relate to having an uncle named 'Kofi' or an auntie named 'Comfort'. I'm curious if 'Elizabeth' comes from the late British queen. All in all, the list tracks: Biblical names (Ghana is majority Christian), Muslim names ('Mohammed'), Akan day names ('Kwaku, Kofi, Yaw'), and old English names (from British influence).

Great! This list makes sense as names with the most reach. But there's a problem—most of these are traditionally male names. Unfortunately, the raw dataset is just of given names; we don't have info on the sex of the people with those names.

then exclude male names,

625,000 names is too many to manually filter. But here's the thing: the names in the dataset follow a power law (specifically, Zipf's law). A handful of names account for most people, while the long tail contains hundreds of thousands of rarer names.

Zipf's law distribution of Ghanaian names
Distribution of Ghanaians' given names in the frequency table

So we need the lion's share and must not concern ourselves with the long tail[7]. We can start with the top 20 names, and manually exclude the male names until the entire list is of female names (it took 3 iterations). The result:

TOP_MALE_NAMES = []
frequency_table_without_top_male_names = Counter(
    {
        name: freq
        for name, freq in frequency_table.items()
        if name.lower() not in TOP_MALE_NAMES
    }
)

print("Name Frequency")
print("-" * 30)
for i, (name, freq) in enumerate(
    frequency_table_without_top_male_names.most_common(20)
):
    print(f"{name:<20} {freq:>8,}")
# Output
Name                Frequency
———————————————
mary                162,935
comfort             107,159
elizabeth            99,714
grace                94,430
esther               85,004
ama                  76,254
janet                74,061
abena                66,960
gifty                61,891
yaa                  58,556
agnes                57,823
faustina             56,901
akua                 54,881
victoria             53,254
joyce                52,733
vida                 51,999
gladys               51,766
rebecca              51,424
abigail              51,195
patience             50,682
Top 20 most common female names

to get the Ghana girls' names with the most reach,

We're getting closer! Let's prepare the list of names to whisper into Castro and Baby Jet's heads as they were writing the song lyrics. If we're naively selecting, we just pick the top 8 above:

before: linda barbara monica jessica pamela sarah gifty diana REACH
35,433 3,903 27,388 4,932 1,348 48,125 61,891 39,326 222,346
after: mary comfort elizabeth grace esther ama janet abena REACH
162,935 107,159 99,714 94,430 85,004 76,254 74,061 66,960 766,517

Boom! We could triple the reach! That's 544,171 more Ghana girls feeling called. We did it! We've helped Castro and Baby Jet optimise their song!

Or...did we?

Here, listen to the original:

clip from "African Girls" by Castro and Baby Jet

Now, try singing these names to the melody:

🎵 Mary, Comfort, Elizabeth, Grace, Esther, Ama, Janet, Abena 🎵

Yeah, you see the problem. Sounds terrible. These names break the lyrical flow completely!

while keeping the rhyme.

So yes, we could get a better name combo, but rhyming matters (duh). This part is more heuristic and fuzzy, more art than pure optimisation. In the original, Castro and Baby Jet use names that end with the 'ah' (ä) sound when pronounced in Ghanaian English[8] (like 'Sarah', 'Diana', 'Monica'), with 'Gifty' thrown in for breathing room.

Our first selection fails because of names like Janet, Grace (sorry, Auntie), Comfort, and Elizabeth (sorry, your majesty, RIP). We need to redo the selection, but this time, let's pick only candidates with that sound: seven names that end in [-a, -ah, -er] (which all sound like ä in Ghanaian pronunciation), and one that ends in [-ty].

The dataset doesn't include pronunciations, so I'm relying on my understanding[9] of how Ghanaians actually say these names. Let's write a simple filter for names ending in 'a', 'ah', 'er', or 'ty' (but not double-a 'aa' endings like 'Naa', which sound very different from what we want).

import re

frequency_table_with_rhyme_without_top_male_names = Counter(
    {
        name: freq
        for name, freq in frequency_table_without_top_male_names.items()
        if re.search(r"[^a](a|ah|er|ty)$", name)
    }
)

print("Name                Frequency")
print("-" * 30)
for i, (name, freq) in enumerate(
    frequency_table_with_rhyme_without_top_male_names.most_common(25)
):
    print(f"{name:<20} {freq:>8,}")
Using regular expression r"[^a](a|ah|er|ty)$" to filter for top 25 names that rhyme better

The result:

Name                Frequency
———————————————
esther               85,004
ama                  76,254
abena                66,960
gifty                61,891
faustina             56,901
akua                 54,881
victoria             53,254
vida                 51,999
rebecca              51,424
felicia              49,217
akosua               48,748
sarah                48,125
adwoa                47,645
lydia                45,224
cecilia              43,401
diana                39,326
hannah               39,265
christiana           38,697
georgina             37,910
afia                 36,889
rita                 36,658
linda                35,433
juliana              33,647
amina                32,806
regina               32,805
Top 25 female Ghanaian names that could rhyme better

Ahaan, that's more like it. Looking at the list, Castro and Baby Jet definitely slapped when they picked 'Gifty'—it's the most common name ending in 'ty' (3x more than 'Charity', the next most common -ty name). So 'Gifty' is locked in.

Now for the seven 'ah' names: this is where it gets fun.

The thing is, almost any combo from these 25 candidates will beat the original's 222k reach. The data constraint is solved.

But which combo actually sounds best? Which has the right rhythm when you sing it? That's subjective and more artistry, so let's make it democratic.

That's where you 🫵🏿 come in. I've built an interactive tool where you can pick your own combo, see how many Ghana girls it might reach, and vote on combinations others have made.

Alright, Your Turn!

Below are the top 25 names that fit the rhyme scheme. 'Gifty' is already locked in (they nailed that one!). Pick 7 other names that would make the best remix, whether you're chasing maximum reach, perfect flow, or just vibes.

Interactive tool to pick your own combo of names at https://african-girls-boost.pages.dev

Conclusion

So there you have it! With some data analysis, we've shown that Castro and Baby Jet could've doubled their reach while keeping the rhyming scheme. The math checks out. Whether they'd trade 'Barbara' for 'Felicia' or 'Diana' for 'Abena' is another question entirely, and up to you to decide!

Vote for your favourite remix, or create your own. Who knows—maybe someone will actually record it 😆 And if they do, just remember: you heard it here first. 🎵

P.S. If you're doing actual research into Ghanaian onomastics or name patterns[10], I'd genuinely love to chat.