When Models Pick Sides: How AI Learns to Discriminate
Medium | 24.12.2025 14:17
When Models Pick Sides: How AI Learns to Discriminate
9 min read
·
1 hour ago
--
Listen
Share
Algorithms have changed the time a credit decision takes from days to seconds. That does not make a rejection feel less hurtful. Be it the preference to pay in cash or the move to a different country, sometimes people’s lifestyles conflict with ‘what looks good’ in a credit file. When algorithmic models disfavour an entire socioeconomic group, the causes lie much deeper, and the consequences are much more severe as we will explore in the third article of Layered Beliefs.
“Colleague-Must-Vent” Read
A quick executive summary for busy professionals
- Bias can be technically introduced when models are trained on data where multiple, seemingly neutral variables correlate to create a “proxy” for protected characteristics
- Models trained on text can implicitly adopt the meta structure of texts, such as political or cultural biases.
- Data analytics is a key technical method for identifying potential bias in models. Internal audit can use statistical techniques to analyse correlations between input variables to find proxies for protected groups. Crucially, audit can compare model output rates, like approvals or credit limits, across different subgroups to detect disparate impacts.
When I moved to the UK, I thought I was invincible. I had not planned it, but a move abroad (again), paid relocation, a promotion, and a beautiful flat made me feel untouchable. Right up until a Sunday afternoon when I tried to get a mobile phone contract.
I failed the credit check. Not because I had messed up my credit. I did not have any credit history. In the other countries I had lived in, there were no central credit agencies; banks just wanted to see your employment contract. UK phone companies were not interested. No history, no contract. Only later did I realise how lucky I was: my landlord accepted a letter from my employer. Many others would have asked for six months’ rent up front. In London, that is brutal.
From a pure risk view, none of this made sense. The phone contract was maybe £20 a month, and they could have cut me off instantly if I did not pay — a tiny potential loss versus a customer for life. The bank, on the other hand, barely blinked. First branch I walked into handed me a credit card with a limit far above the salary of a job I had not even started, based on a suit, a transfer letter and a Canary Wharf office address.
From a business perspective, it was perfectly rational. Having someone at the phone company manually review my case would cost more than they would ever earn on my contract. Easier to just say no. The bank saw something else: future earnings, future products, eventually a mortgage.
What happened to me was inconvenient. But when algorithms learn to discriminate the consequences compound across generations. The question is not whether AI can be biased. It’s whether we can detect bias that even the model’s creators do not understand.
Redlining
In the first half of the 20th century, US mortgage lenders drew large colour-coded maps of around 200 cities. A property in a green zone was highly desirable to finance; in a red zone, it was effectively un-lendable. As it turned out, the red areas were largely occupied by people of colour, the green areas by White Americans.
This practice, known as redlining, was discrimination by zip code. Because different groups had historically clustered in particular neighbourhoods, zip code became a proxy for race and socioeconomic status. By refusing to lend in red areas, or only doing so at much higher interest rates, lenders starved these communities of credit, pushed down property values and concentrated poverty. Over decades, the vicious circle became a self-fulfilling prophecy.
One might be tempted to give lenders the benefit of the doubt and blame “historic inequities” that politicians should have fixed, arguing that banks simply needed to avoid risk and protect profitability. But that is hard to sustain.
The Federal Housing Administration’s Underwriting Manual explicitly stated that neighbourhoods needed to remain occupied by the same social and racial classes to “retain stability” and warned against “infiltration by inharmonious racial or nationality groups” (Federal Housing Administration, Underwriting Manual, 1938, Part II, Sec. 937).
Redlining and related discriminatory practices were made illegal from the 1960s onwards. Banks could no longer use zip codes directly as inputs to their lending decisions. Nevertheless, neighbourhood-based discrimination persisted. A 2010 investigation found that US mortgage lenders were using scorecards whose variables allowed them to infer the location of a property, and with high probability the borrower’s socioeconomic group.
Individually, these variables — often presented as neutral measures of “regional risk” — would not reliably identify an applicant’s background. But in combination, factors such as property appraisal, requested loan-to-value ratio, income, and even the choice of real-estate agent gave lenders a clear enough picture of where someone lived, and therefore a proxy for race.
Discrimination by Proxy
Banks defended their practices with economic arguments such as higher default rates or price devaluations, or that correlation between race and variables had been either unknown or accidental.
It is plausible that economic considerations were a primary driver for some banks, and ‘hidden’ correlations exist as we will see in the next case. However, according to a report from the U.S. Department of Housing and Urban Development, in some cities, a Black person with the same income as a White person was four times as likely to end up with only a subprime mortgage. Due to this widespread divergence across the country, it is reasonable to conclude that racism was a key driver in many of these decisions.
As a result, various regulations and transparency requirements were introduced. Two technological developments in the banking world also promised less discriminatory credit decisions: big data and algorithms.
In 2019 the Apple Card launched as a competitor to traditional credit cards using an — at the time — highly advanced algorithm. Not long after the launch, accusations of a gender bias were raised. Many users reported that women were granted lower credit limits than men with similar income. In some cases, women received lower limits than their husbands despite sharing finances.
An applicant’s sex was not an input to the model. An investigation by the New York State Department of Financial Services (NYSDFS) found weaknesses that required remediation, but no evidence for deliberate discrimination or unlawful unintentional discrimination. What happened?
How exactly the algorithm produced these disparate outcomes for women is not entirely clear. However, it was trained on historical data and must have identified women based on proxy variables like surname changes, consumer behaviour, career breaks or part-time work arrangements and correlated these with variables like lower and less steady income, or hardship after divorces.
This does not mean the algorithm thought to itself: “Aha, here we have a woman, let’s discriminate.” Given sex was not a data field it might not even have known what a woman is. It simply understood that applicants with some characteristics were more likely to have difficulties paying off their debt historically and assumed this would be the case in the future.
The algorithm looked at applicants as individuals and did not consider joint finances. E.g. if on a joint mortgage application the husband was the primary applicant, or he applied for a car loan for the family van, the monthly payments from his wife would count towards his credit score. This would give him a larger, more positive credit file while the wife maintained a short and patchy credit history.
The NYSDFS concluded that especially the thickness of the credit file was a key driver in the algorithm’s decision, an attribute that disproportionately affected women.
“Models, despite their reputation for impartiality, reflect goals and ideology. Our own values and desires influence our choices, from the data we choose to collect to the questions we ask. Models are opinions embedded in mathematics.” Cathy O’Neil (“Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy”, 2016)
When the Data Make the Bias
Apple Card’s problem was that many factors in financial and personal data correlate, and it is hard to isolate one characteristic from the others. Removing sex from a credit application does not eliminate lifestyle differences between sexes.
Therefore, models and AI trained on these data can identify patterns deeply hidden from us; they can discriminate against certain groups unbeknownst to their developers. Like many issues and risks with AI, the problem is not new but significantly amplified.
Unlike the case of the mortgage lenders, there was no clear indication that the issues with Apple Card were deliberate. It was the data determining the outcome.
In one of his many appearances on Joe Rogan’s podcast, Elon Musk wanted to demonstrate that Grok, X’s AI, is not skewed by “political correctness” — an accusation often raised against other LLMs. It did not go as planned. Grok refused to make any ‘politically incorrect’ jokes, and the jokes it told were far too lame for Rogan and Musk.
Many were quick to point out that if even Grok does not want to touch certain topics, it must be differentiating between right and wrong. A different interpretation of Grok’s behaviour was that the data it was trained on were contaminated by ‘woke ideology’. In the end, both were wrong.
Something Deeply Hidden
Data are not neutral. Even if we eliminate the obvious markers for differences — gender, race, zip code — the data still contain what that meant historically on a much deeper level, maybe even in ways we will never understand.
As AI systems become more advanced, they will dig deeper into the meta level of data. If humans cannot tell them decisions based on a certain pattern they spot is discriminatory, they won’t know.
The fascinating facet of this is that it is not only statistical socioeconomic patterns they learn, but also the values and moral constructs of our society. Not because they are conscious and understand meaning. Models do not know meaning. They produce sentences based on probabilities. The text they are trained on contains meaning, but they cannot independently infer meaning. But they will still replicate what they learned.
If a Large Language Model (LLM) were trained only on writings of Karl Marx, this would be its entire world and all it knows. If you asked it about a balanced opinion on economic policy, it would still be very Marxist.
Grok did not know right from wrong. At the same time, not making fun of minority groups is part of the value system of people across the political spectrum; it is also deeply rooted in many religions and atheist philosophies. It is therefore implicit in the meta structure of training data. You do not need ‘woke ideology’ for that.
Many LLMs are trained on the internet. I hope this does not come as a shock, but not everything you read on the internet is true.
Wrap-Up
“Opinions embedded in mathematics” is how Cathy O’Neil describes models in her politically charged but technically thorough book “Weapons of Math Destruction”. Even models purely based on data can be biased; this is inevitable. By model owner, Model Risk Management and Audit being cognizant of that, mistakes and discrimination can be identified early.
But this will also open the door to challenges and questions around model outputs. Like fake news and scientific conspiracies, ‘contaminated input data’ or something similar could become a new slogan to spread doubt, until no one knows what to believe. That is a problem idiosyncratic to AI models, as we will soon explore.
For internal audit and model risk this creates challenges, but also opportunities. Ten years ago, we already used statistical correlations to identify patterns in trading data. Finding these in training sets of AI is much more complex, but as risk and audit start to use AI in their work, identifying discriminatory patterns in training data is an excellent use case. It will enable audit to compare model output rates, like approvals or credit limits, across different variables and potentially detect disparate impacts.
And Grok? A few weeks after the podcast aired, Grok had a change of mind and started telling more controversial jokes.
Disclaimer
This publication is for general information purposes only and does not constitute professional, financial or investment advice. Technical descriptions have been simplified for the benefit of a broad audience. The article may employ hyperbole, irony, or other rhetorical devices; not every statement should be interpreted literally.
The opinions expressed are solely those of the author and may include alternative perspectives presented for discussion or editorial purposes. They do not necessarily reflect the views of any organisation the author represents.
Any company names, quotes, or individuals mentioned serve as historical examples. They are used illustratively, without endorsement or criticism beyond the factual context. All forward-looking statements reflect personal views and are not guarantees of future outcomes.
Originally published at https://www.linkedin.com.