WhatsApp groups have served as an environment to establish collective conversations with others around the world.
In this tutorial, we'll generate and plot analytics based on the participants of a WhatsApp Group. We'll geocode the users' location and generate a country-level distribution. This interface will be built with Python using Selenium, Plotly, Vonage Number Insight API, Google Maps API, and Mapbox API.
Vonage API Account
To complete this tutorial, you will need a Vonage API account. If you don’t have one already, you can sign up today and start building with free credit. Once you have an account, you can find your API Key and API Secret at the top of the Vonage API Dashboard.
Prerequisites
To follow and fully understand this tutorial, you'll also need to have:
Python 3.6 or newer.
Basic knowledge of automation with Selenium.
Set up Google Maps API.
Below are the results of the final interface you’ll build:
File Structure
See an overview of the file directory for this project below:
├── README.md
├── analytics.py
├── automate.py
├── chromedriver
├── env
├── geocoding.py
├── main.py
└── plotting.py
The content of all files listed in the directory tree above will be created through this tutorial's subsequent steps.
Set up a Python Virtual Environment
You'll need an isolated environment for the python dependencies management unique to this project.
First, create a new development folder. In your terminal, run:
$ mkdir whatsapp-spatial-mapping
Next, create a new Python virtual environment. If you are using Anaconda, you can run the following command:
$ conda create -n env python=3.6
Then you can activate the environment using:
$ conda activate env
If you are using a standard distribution of Python, create a new virtual environment by running the command below:
$ python -m venv env
To activate the new environment on a Mac or Linux computer, run:
$ source env/bin/activate
If you are using a Windows computer, activate the environment as follows:
$ venv\Scripts\activate
Regardless of the method you used to create and activate the virtual environment, your prompt should look like the following:
(whatsapp-spatial-mapping) $
Requirement file
Next with the virtual environment active, install the project dependencies and their specific versions as outlined shown below:
chart-studio==1.1.0
googlemaps==4.4.2
vonage==2.5.5
numpy==1.19.4
pandas==1.2.0
plotly==4.14.1
plotly-express==0.4.1
python-decouple==3.3
selenium==3.141.0
These packages with the specific versions can be installed via the requirement file from your terminal:$ pip install -r requirements.txt
or conda install --file requirements.txt
(if you are on Anaconda) and voila! All of the program’s dependencies will be downloaded, installed, and ready to be used.
Optionally, you can install all the packages as follows:
Using Pip:
pip install chart-studio googlemaps nexmo numpy pandas plotly plotly-express python-decouple selenium
CopyUsing Conda:
conda install -c conda-forge chart-studio googlemaps nexmo numpy pandas plotly plotly-express python-decouple selenium
Copy
Setting up APIs and Credentials
Next, you'll need to set up some accounts and get the required API credentials.
Google Maps API
The Google Maps API will enable the geocoding function, which is crucial to this project. The API is readily available on Google Cloud Console.
First, you need to set up a Google Cloud free tier account, where you get $300 free credits to explore the Google Cloud Platform and products. Next, with your Google Cloud Console all set up, you need to create an API key to connect the Google Maps Platform to the application.
Finally, activate the Google Maps Geocoding API to enable it for the project.
Plotly API and Mapbox Credentials
To create beautiful data visualizations, Plotly on Python will be utilized, and the aesthetics enhanced using Mapbox.
The Plotly plots are hosted online on Chart Studio (part of Plotly Enterprise); you need to sign up, generate and save your custom Plotly API key.
To achieve the desired plot enhancement, you also need to sign up for Mapbox and create a Mapbox authorization token.
Separation of Settings Parameters and Source Code
In the previous section, you've generated various API credentials.
It is best practice to store these credentials as environment variables instead of having them in your source code.
An environment file can easily be set up by creating a new file and naming it .env
, or via the terminal as follow:
(whatsapp-spatial-mapping) $ touch .env # create a new .env file
(whatsapp-spatial-mapping) $ nano .env # open the .env file
The environment file consists of key-value pair variables. For example:
user=Brain
secret=xxxxxxxxxxxxxxxxxxxxxxxxxx
You can access these environment variables in the source code using the Python Decouple built-in module.
It's also good practice to add the
.env
file to the gitignore file. Doing so prevents sensitive information such as API credentials to become public.
The scripts follow the Object-Oriented Programming paradigm. The following are high-level explanations for each script.
automate.py
The first step to this project workflow is WhatsApp automation using Selenium.
Selenium is an open-source web-based automation tool that requires a driver to control the browser. Different drivers exist due to various browser configurations; some of the popular browsers' drivers are listed below:
This tutorial uses the Chrome driver. To make it quick and easy to access, move the downloaded driver file to the same directory as the script utilizing it. See the file structure above.
This script comprises a WhatsappAutomation
class that loads the web driver via its path, maximizes the browser window, and loads the Whatsapp Web application. The 30 seconds delay initiated is to provide the time to scan the QR code to access your Whatsapp account on the web.
Upon scanning your QR code with your phone, your Whatsapp account opens on the web.
The WhatsappAutomation
class has two classes
get_contacts()
quit()
The browser will notify you that "Chrome is being controlled by automated test software" to indicate that Selenium will have been activated for automation in the browser.
Next, you need to access the desired group and contacts, as shown below.
The automation step involves locating the WhatsApp web page element that contains the phone numbers as seen in the image above. There are numerous ways to select these elements, as highlighted in the Selenium documentation. For this project, use xpath
.
To access these element selectors, you need to inspect the Whatsapp web page.
Next, the contact entries obtained via the Xpath need to be cleaned up and saved as a CSV file. You'll use regular expressions to remove the '+' character and any whitespaces from the phone numbers.
To promote efficient memory management, quit the selenium-powered browser upon completion of a session.
import time
import re
import csv
from selenium import webdriver
class WhatsappAutomation:
def __init__(self):
self.chrome_browser = webdriver.Chrome('./chromedriver')
self.chrome_browser.maximize_window()
self.chrome_browser.get('https://web.whatsapp.com/')
time.sleep(30)
def get_contacts(self, whatsapp_group_xpath, contact_element_xpath):
group = self.chrome_browser.find_element_by_xpath(whatsapp_group_xpath)
group.click()
time.sleep(10)
contacts = self.chrome_browser.find_elements_by_xpath(
contact_element_xpath)
# The find elements returns a list object
contacts = contacts[0].get_attribute('textContent')
# You have to remove white spaces in the numbers
contacts = re.sub(r"\s+", "", contacts)
# You have to remove symbols such as '()-'
contacts = re.sub(r"[()+-]", "", contacts)
# Your number is shown as 'You' on WhatsApp, so you need to remove that as well
contacts = contacts.replace(",You", "")
# convert the string to list
contact_list = contacts.split(',')
# writing the contacts (list) to a csv file
f = open('contact_data.csv', 'w')
w = csv.writer(f, delimiter=',')
# create header
w.writerow(['contact'])
# split the comma separated string values into a CSV file
w.writerows([x.split(',') for x in contact_list])
f.close()
return contact_list
def quit(self):
print('Quiting session in 10 seconds...')
time.sleep(10)
self.chrome_browser.quit()
analytics.py
Next, you'll use the Vonage Number Insights API to generate insights from the saved CSV file. This API provides information about the validity, reachability and roaming status of a phone number.
The script is made up of a WhatsappAnalytics
class that first loads the Vonage credentials stored in the .env
file using the Python decouple
module. Next, it has a get_insight()
method that takes the contact list and initiates an Advanced Number Insight to get the countries associated with the phone numbers. Finally, the list of countries is saved as a CSV file.
from decouple import config
import json
import csv
import nexmo
import pandas as pd
class WhatsappAnalytics:
def __init__(self):
# Setting up Nexmo credentials
self.key = config('client_key')
self.secret = config('client_secret')
self.client = nexmo.Client(key=self.key, secret=self.secret)
def get_insights(self, contact_list):
print('Getting number insights')
data = []
for contact in contact_list:
insight_json = self.client.get_advanced_number_insight(
number=contact).get('country_name')
data.append(insight_json)
# convert the list
f = open('country_data.csv', 'w')
w = csv.writer(f, delimiter=',')
# create header
w.writerow(['country'])
# split the comma separated string values into a CSV file
w.writerows([x.split(',') for x in data])
f.close()
print('Number insights generated successfully')
dataframe = pd.read_csv('country_data.csv')
return dataframe
geocoding.py
Next, the string description of the various locations (country names) will be geocoded to create the respective geographic coordinates (latitude/longitude pairs).
This script is made of a GoogleGeocoding
class that first loads the Google Maps API keys. This class has a geocode_df
method with a dataframe
argument—the phone numbers and countries previously saved. This method also aggregates the dataframe by countries and returns the respective latitude and longitude pairs.
from decouple import config
import pandas as pd
import googlemaps
class GoogleGeocoding:
def __init__(self):
self.key = config('api_key')
self.gmaps = googlemaps.Client(key=self.key)
def geocode_df(self, dataframe):
print('Preparing for geocoding country code...')
df = dataframe
df = df.value_counts().rename_axis('country').reset_index(name='counts')
for index in df.index:
df.loc[index, 'longitude'] = (self.gmaps.geocode(df['country'][index]))[0].get('geometry').get('location').get('lng')
df.loc[index, 'latitude'] = (self.gmaps.geocode(df['country'][index]))[0].get('geometry').get('location').get('lat')
df.to_csv('geocode_data.csv', index=False)
print('Geocoding completed')
return df
plotting.py
Next, you will need to map the geospatial data created (latitude and longitude pairs).
Mapmaking is an art; to make the project results aesthetically pleasing, use the Plotly library and Mapbox maps.
This script comprises the SpatialMapping
class that loads the Mapbox token and chart_studio credentials. This class has two methods, plot_map
and plot_bar
, that plot the distribution of the Whatsapp group's users as a map and a bar chart.
from decouple import config
import plotly.express as px
import chart_studio
from chart_studio import plotly as py
# Setting credentials
px.set_mapbox_access_token(config('mapbox_public_token'))
cs_username = config('chart_studio_username')
cs_api = config('chart_studio_api')
chart_studio.tools.set_credentials_file(username=cs_username,
api_key=cs_api)
class SpatialMapping:
def plot_map(self, dataframe):
fig = px.scatter_mapbox(
dataframe, lat="latitude", lon="longitude",
color="counts",
size="counts",
color_continuous_scale=px.colors.sequential.Greens,
size_max=20,
zoom=1,
hover_data=["country", 'counts'],
hover_name='country')
fig.update_layout(
title='WhatsApp Analytics: Spatial Mapping of WhatsApp group contacts',
mapbox_style="dark")
fig.show()
print('The link to the plot can be found here: ', py.plot(
fig, filename='Whatsapp Analytics Map', auto_open=True))
def plot_bar(self, dataframe):
fig = px.bar(
dataframe, x='country', y='counts',
hover_data=["country", 'counts'],
color_discrete_sequence=['darkgreen'])
fig.update_layout(
title='WhatsApp Analytics: Distribution of WhatsApp group contacts')
fig.show()
main.py
main.py is the point of execution of the program. Here, all the script classes and imported, and the various required parameters are inputted in the main()
function.
from automate import WhatsappAutomation
from analytics import WhatsappAnalytics
from geocoding import GoogleGeocoding
from plotting import SpatialMapping
def main():
if __name__ == '__main__':
automated_object = WhatsappAutomation()
group_xpath = '//*[@id="pane-side"]/div[1]/div/div'
contact_xpath = '//*[@id="main"]/header/div[2]/div[2]/span'
contact_list = automated_object.get_contacts(group_xpath, contact_xpath)
automated_object.quit()
analytics_object = WhatsappAnalytics()
analytics_df = analytics_object.get_insights(contact_list)
geocoding_object = GoogleGeocoding()
geo_df = geocoding_object.geocode_df(analytics_df)
spatial_mapping_object = SpatialMapping()
spatial_mapping_object.plot_map(geo_df)
spatial_mapping_object.plot_bar(geo_df)
main()
Try it out
In your terminal, run the main script file as follows:
$ python3 main.py
This will import the various scripts and execute the main()
function to yield the desired results.
Results
I’m sure you can already think of all the possibilities and use cases for this new piece of knowledge. The possibilities are endless.
Thanks for taking the time to read this article!
Happy Learning!
References
Aboze Brain John is a Technology Business Analyst at Axa Mansard. He has experience in Data Science and Analytics, Product Research, and Technical Writing. Brain has been engaged in end-to-end data analytics projects ranging from data collection, exploration, transformation/wrangling, modelling, and derivation of actionable business insights and provides knowledge leadership.