When a poor and needy individual (faqir/ miskeen) is wrongly classified as non-poor and deprived of zakat (a case of false-negative) or a rich is wrongly classified as poor and receives zakat (a case of false-positive), it involves ethical dilemmas of varying order. Of course, there are in-built mechanisms within the classification models to address such dilemmas
Islamic economics is about ethical economic behavior. When we wonder if Sophia, the robo-lady can develop a “conscience” in addition to “intelligence” with more data and more training, it is about ethics. When there is concern expressed about whether she will be accountable for her actions, it is about ethics (we will revert to this later). We ended the previous blog with the probability of classification error by our machine. When a poor and needy individual (faqir/ miskeen) is wrongly classified as non-poor and deprived of zakat (a case of false-negative) or a rich is wrongly classified as poor and receives zakat (a case of false-positive), it involves ethical dilemmas of varying order. Of course, there are in-built mechanisms within the classification models to address such dilemmas (balance false-negatives and false-positives). We need to dive deeper into the AI toolkit to unearth more.
Machine learning involves “training” of the machine with more and more of data. The process is perhaps similar to training of a horse. Training of horses, throughout history, has been a profession of nobility (similar to present-day data scientists!). Horse trainers would tell us about different signals or cues to make a horse move forward or run faster or halt. The horse continues to receive the cues along with actions that show “approval” or “disapproval” by the trainer. Machines are trained in a similar fashion. A machine (horse) learns from the input (cue) and the output (pat-on-the-back or a spank), and builds the logic and predicts the output for a given input (learns to move or run faster or halt for a given cue). For those less comfortable with horses and more with numbers, here is a simple example. Instead of giving the program / logic to the computer, we give the input-output. In the first instance, we want to add two numbers. So, we give the computer the data (a=5 , b=6) and the logic (addition), then we get the output (11). This is traditional programming. For machine learning we give a=5, b=6 and output = 11; so the system attempts to understand, how come 5, 6 is 11? It continues to improve its understanding with more data (a=3, b=4 and output = 7), builds the logic (addition) and then goes on to predict output for any input.
Some have spoken about “peak oil” which of course, was preceded by “peak horse” that occurred in early 1900s as a result of the advent of combustion engines. Well, Islamic traditions have through the ages, underlined the perpetual “goodness of the horse”. A horse is always for the connoisseur.
Again at the risk of digressing a bit, in a rather lighter vein, does the contrast in our examples above have any historical or philosophical significance? As some contemporary thinkers have said, “Data is new oil”. Some have spoken about “peak oil” which of course, was preceded by “peak horse” that occurred in early 1900s as a result of the advent of combustion engines. Well, Islamic traditions have through the ages, underlined the perpetual “goodness of the horse”. A horse is always for the connoisseur.
And just as there have been different training methods for a horse, depending upon whether it was being trained for a race or carriage or war or deliver posts, there are types of machine learning. Three broad types are supervised, unsupervised and reinforcement.
First, let us consider supervised learning, which is fast, accurate and most commonly used ML. Machine learning takes data as input. Let’s call this data “training data”. The training data includes both inputs and labels (targets). Going back to earlier numerical example, inputs are 5,6 and target is 11.
We first train the model with the lots of training data (inputs & targets). Then with new data and the logic we get, we predict the output. Note that we may not get exact 6 as answer; we may get value which is close to 6 based on training data and algorithm. Two types of problems are addressed through supervised learning – regression and classification. Basically, classification separates the data. Regression fits the data.
Regression: This is a type of problem where we need to predict the continuous-response value (we predict a number which can vary from -infinity to +infinity). In our earlier zakat estimation example, we need to predict the prices of gold, silver, shares, value of inventories, agricultural produce, lease rental rates/ prices of houses and other assets etc.
Classification: This is a type of problem where we predict the categorical response value where the data can be separated into specific “classes” (we predict one of the values in a set of values). We can add a few more to our earlier examples. Note that the initial six classifications are binary ones (yes or no), while the last three are examples of multi-class classification.
- Is individual A in the poor (faqir) category of mustahiq or not?
- Is individual B in the traveler (ibn sabeel) category of mustahiq or not?
- Are donors to Global Sadaqah (a crowdfunding platform) a satisfied lot or not?
- Will donors to Global Sadaqah revert as repeat-donors or not?
- Is individual C going to default on his repayment of qard or not?
- Is this picture of Brother X, or not?
Very high/ high/ moderate/ low/ very low
- What are the chances that a transaction by individual E will prove to be a fraudulent one?
- What are the chances that zakat beneficiary F receiving certain skills will be able to transform himself/ herself into a muzakki-entrepreneur?
- What are the chances that project D will utilize zakat funds in the Shariah-stipulated way?
In unsupervised learning, the training data does not include targets. So, we don’t tell the system where to go. The system has to understand itself from the data we give. It has to understand patterns in the data itself. Here training data is not structured.
There are also different types for unsupervised learning like clustering and anomaly detection. Clustering is a type of problem where we group similar things together. It is perhaps similar to multi class classification but here we don’t provide the labels, the system understands from data itself and clusters the data.
Some examples are:
- given questions or comments on a zakat portal, cluster them into payer-types
- given set of tweets on volunteering portal, cluster based on content of tweets
- given whatsapp forwards in Islamic women groups, cluster senders into different types
Unsupervised learning is relatively more difficult to implement and not used as widely as supervised.
We can perhaps leave the technical details out, if we see ourselves simply as business users of machine learning algorithms. We don’t have to build our own machine learning model. We can seek the services of dedicated data scientists. As Islamic finance or social finance professionals, we may come across machine learning models that do one or more of the above tasks. Once we are done asking what it does and why we should need it, our next set of questions ought to be: how good is it, should we trust it?
We need to know a few things about model evaluation. How good are the model’s predictions? Interestingly, we can find this by comparing what the model predicted with what we already know, but haven’t shared with the machine. For example, our Akhuwat (qard-fund) manager is interested in a model that can correctly classify its new beneficiaries into good and bad borrowers (from default point of view). It has data for the past two decades on its beneficiaries that includes the miniscule population of defaulters. It knows who defaulted and who did not. Let us say, it shares data for the first twelve years (60 percent) of data with the machine to train the prediction model. This set is then called the “training dataset”. Then it uses the next four years of data to chose between alternative models. This data set is called “validation dataset”. Finally, it uses the last four years of data (20 percent) to find the optimal parameters of the model that minimize errors.
It perhaps serves well those entities that are experiencing high credit default risk and don’t have a clue on how to tackle it. At the same time such models may offer alternative value propositions to Islamic microfinance institutions that may like to predict the outcomes of their skill-enhancement initiatives
Now, one may argue that there is no business case for Akhuwat type organizations to use our prediction models. An organization which enjoys minimal defaults (around one percent) may not find the idea of employing our prediction algorithms a good business proposition. The issue of cost versus benefits of such an exercise is certainly relevant. It perhaps serves well those entities that are experiencing high credit default risk and don’t have a clue on how to tackle it. At the same time such models may offer alternative value propositions to Islamic microfinance institutions that may like to predict the outcomes of their skill-enhancement initiatives – predict potential winners among micro-entrepreneurs based on data that go beyond their education and apparent competencies.
At the same time, use of “alternative data” raises serious ethical issues, especially from an Islamic point of view. Islamic societies place huge value on “privacy”, and for good reasons. Can our machines come up as winners in the face of this ethical constraint?
There is merit in the idea that machine learning models make predictions based on data that is beyond what is traditionally used to assess credit-worthiness (such as, proof of income or employment) or entrepreneurial traits in individuals. Poor people are financially excluded primarily because they are data excluded. ML models use “alternative data” that have no apparent relationship with financial and business capabilities of the client, such as, the number of contacts on one’s phone, the make of the phone, one’s average mobile top-up value, one’s online access patterns, indeed, anything relating to their digital footprint. Apparently of low value, there are hundreds of such data types that are now available to our machine to explore and find relationships with loan repayment or entrepreneurial success. At the same time, use of “alternative data” raises serious ethical issues, especially from an Islamic point of view. Islamic societies place huge value on “privacy”, and for good reasons. Can our machines come up as winners in the face of this ethical constraint? We leave this question for the next blog.
(To be continued)
 Squeezing the horse with one’s legs is the signal that it should move forward. To make a horse run faster, the cue is to give it a short, verbal command like “trot” or “gallop” in a soft, gentle voice. To cue for a halt, one should close one’s fingers and squeeze backward.
 Before competing in a race, a horse had to undergo a period of training, termed tadmir or idmar, which lasted for some forty to sixty days. (Encyclopaedia of Islam, 2:953)