Ethical Software Engineering Lab Course

There is now, at long last, wide concern over the negative effects of technology, along with calls to teach ethics to engineers. But critique is not enough. What tools are available to the working engineer to identify and mitigate the potential harms of their work?

I’ve been teaching the effects of technology on society for some time, and we cover a lot of it in my computational journalism course. This is an outline for a broader hands-on course, which I’m calling the Ethical Engineering Lab.

This eight-week course is a hands-on introduction to the practice of what you might call harm-aware software engineering. I’ve structured it around the Institute for the Future’s Ethical OS, a framework I’ve found useful for categorizing the places where technology intersects with personal and social harm. Each class is three hours long, split between lecture and lab time. Students must complete a project investigating actual or potential harms from technology, and their mitigations.

Each lecture is structured around a set of issues, cases where technology is or could be involved in harm, and tools, methods for mitigating these harms. The goal is to train students in the current state-of-the-art of these problems, which often requires a deep dive into both the social and technical perspectives. We will study both differential privacy algorithms and HIPAA health data privacy. In many cases there is disagreement over the potential for certain harms and their seriousness, so we will explore the tradeoffs of possible design choices.

Our hands-on exploration (lab time and final projects) will involve a combination of qualitative and quantitative methods. For example, we might read the EULAs of all the products we use and see if there are any surprises. Or we might use a Jupyter notebook with real data from the COMPAS criminal justice risk assessment algorithm to investigate the tradeoffs between different definitions of quantitative fairness. I’ve included example projects that students could do in each area.

Some technical background is required, as the goal is to teach the engineering aspects of these problems. Many but not all final projects will require coding. I particularly encourage students to choose a project related to their work.

This post is just a sketch to suggest the sort of material I’d want to include. Doubtless, a great many things are missing. What else should we cover? What references are especially good on these topics? Do you want me to teach this at your organization? Get in touch.

Truth, Disinformation, Propaganda

Issues

Overview of recent disinformation campaigns (2016 election and globally).
Disinformation spreads farther than truth.
Review of current state-of-the-art in audio, video, text, and photo deepfakes.
Defining “propaganda.” The ethics of persuasion.
What the most advanced chatbots can do today.

Tools

Contemporary institutional counter-disinformation practices.
Recommendation algorithms: design patterns for various social goals.
Moderation system design.
How Facebook responds to information operations.
Associated Press guidelines for identifying machine-written content.

Discussions

How could your technology be used as part of a disinformation campaign?

Example Projects

Build a chatbot that impersonates a person or company. See if you can fool your classmates.
Build a fake news classifier from one of the common fake news datasets. What signals does it end up learning? Can it be made to work at scale?

Addiction & The Dopamine Economy

Issues

Addiction psychology, through the example of gambling and casino design.
Defining “engagement” and the effects of optimizing for it.
Effects of screens on sleep.
“Ultra-FOMO”: What do constant images of perfection do to us?

Tools

“time well spent” metrics; well-being research
Screen time reports
Human and algorithmic approaches to evaluating content quality

Discussion

What would addiction look like on your platform?
How can your business make money without addiction?

Example Projects

Estimate quantitative effect of removing a particular addictive feature. Or implement a change to your product and find out.
Build a machine learning system that ranks content by “quality,” in the “time well spent” sense. What measure are you using, and why, and how does the classifier perform relative to this standard?

Economic and Asset Inequalities

Issues

Personalized pricing can charge poorer people more. This doesn’t have be intentional; a very simple three line algorithm will do it.
Pricing AIs will collude to fix prices.
Auto insurance continues to be more expensive in minority neighborhoods, even after adjusting for risk.

Example Projects

Reproduce the simulation which showed that pricing algorithms will collude. Under what conditions will this happen? How can AIs be designed not to do this?
Analyze real lending data to determine the demographics of who gets a loan now, and how that would change if better prediction was available, as this notebook does.
Simulate personalized pricing, using a model to estimate of willingness to pay of different demographics (location, age, etc.). How will this affect the distribution of prices between different income levels?

Machine Ethics & Algorithmic Biases

Issues

A framework for thinking about analyzing data for evidence of bias.
Google shows ads for higher paid job to men.
Sexism in word embeddings: Man is to programmer as woman is to housewife, however, there was an instructive error in the research that led to this particular example.
Better loan payback prediction will increase disparities in interest rates.
Ethical problems with prediction in general in criminal justice.
Prediction feedback loops,

Tools

Introduction to quantitative fairness measures. Three classic types, their advantages and drawbacks. 1) Demographic parity: hire the same number of men and women. 2) Equal error rates: make sure the classifier fails the same amount for different races. 3) Calibration: ensure a prediction means the same for all groups.
Stanford’s Law, Bias, and Algorithms course notebooks.
Impossibility theorem: you can only have one of these at once when base rates differ between groups. Type of fairness is a policy choice.
Real world outcomes. After recidivism prediction was introduced in Kentucky, judges initially reduced detainment rates in accordance with computed risk scores but the effect gradually wore off. A detailed analysis of the effect of predicting which children will likely require intervention by child protection services.

Example Projects

Quantify the tradeoffs between different types of fairness in the COMPAS criminal justice risk assessment data set, as in this notebook.
Simulate and analyze the feedback loops in predictive policing.

The Surveillance State

Issues

China’s inept “social credit system” and the much more sophisticated surveillance system used by police in Xinjiang.
Surveillance by landords, including video cameras and social media monitoring, is being used to harass and evict tenants.
The potential costs of sharing your heart rate and other health data.
Western companies selling surveillance tech to authoritarian regimes, and Chinese products collecting data for the government.

Scenarios

Data mining tools used for investigative journalism are re-purposed for harassment
China’s social credit system grows up and is applied to users worldwide to enforce authoritarian norms.
Police facial recognition cameras effectively track every citizen’s location, bypassing 4^th amendment protections on tracking.

Discussions

What are the technical, legal, and social factors that prevent law enforcement from abusing mass surveillance – in each country? How will your technology interact with these

Project Ideas

With their prior permission, investigate a classmate through public information only. What can you correctly infer about their life?
Publicly display your heart rate for a week and report your results.

Data Control & Monetization

Issues

Data privacy law primer, including GDPR and HIPPA.
Inadvertent collection of data. Google Wifi, Mixpanel passwords.
Data leaks due to mistakes and hacks.
The effect of making ostensibly “public” data more available or interpretable. E.g. Graffiti tracker, The Journal News’ gun map.

Tools

Redaction and minimization. Differential privacy, through the example of the new privacy changes for the 2020 Census.
Location data. How much it reveals, how easy it is to de-anonymize.
Health data. Correlations with life outcomes. Regulatory issues.
Issues of personalized recommendations and ads, e.g. targeting ads to pregnant women.
General effects of better prediction on the distribution of resources and risk. For example, if you had perfect information on someone’s future health, would that destroy the health insurance market?

Discussions

What data do you collect? Split into small groups, discuss and make a list, merge lists. Were any types of data not listed by a group because you were missing someone with a specific perspective?
What rights would your users want in regards to their data? What problems will they have if they don’t have these rights?

Example projects

Experiment with adding differential privacy to one of your APIs. How easy is it to learn personal information, via reconstructions from multiple API calls, before and after?
Reconstruct someone’s life from anonymized location data (someone in the class could give it to you, or you could use the NYC taxi data, or data from apps.)

Implicit Trust and User Understanding

Issues

No one reads EULAs
Unroll.me was reading your email for Lyft receipts and reporting to Uber
Dark Patterns in UI design

Projects:

Take one day of your browser history, re-visit every site. Read the EULAs and record anything that surprises you.
Document the dark patterns you encounter on these sites.

Hateful & Criminal Actors

Issues

The challenge of platform counter-terrorism, from Facebook’s point of view.
Bibliography of papers on online harassment and machine learning.
Attacks on image recognition algorithms: wear a t-shirt and confuse an AI.
It is now possible to run automated spear-phishing.
The Darkweb, anonymity, and security, as told through the story of The Silk Road.

Example Projects

Build a hate speech classifier. Does it work well enough to be useful? What have you learned about the complexity of this problem?
Estimate the percentage of bitcoin transactions which are used for criminal activity
Pick a platform or product. Come up with a plan to use it for criminal activity, including the security measures you would take.

Jonathan Stray

Information, culture, and belief

Ethical Software Engineering Lab Course

Leave a Reply