SQL Tact: ethics

Showing posts with label ethics. Show all posts

Tuesday, February 28, 2023

30 Interview Questions for a Database Administrator and Developer

This blog post summarizes the type of technical questions I would ask candidates for a Microsoft SQL Server data platform administrator and database developer role.

Hopefully this helps both candidates and managers prepare for interviews. I have no qualms in providing the brief answers because your interview, like mine, should be:

behavioral: based on scenarios, not multiple choice answers.
open ended: ask for an explanation, not a single word answer.
conversational: testing how the candidate would explain this to a client or colleague.
applicable: only ask questions relevant to your environment and in the job description.

When I was the manager of a SQL Server managed service provider and a principal consultant at a consulting company, I interviewed and hired database administrators to be consultants, remote DBAs, and database developers for our app dev internal projects.

I've divided the 30 questions into four categories.

»»»Click here to read more »»»

Thursday, November 10, 2022

I'm speaking at PASS Data Community Summit 2022

Looking forward to speaking in Seattle at next week's PASS Data Community Summit by Redgate. I'll be speaking both as a representative of my role on the Microsoft Database Docs team and as a data professional.

On Wednesday Nov 16, you'll find me in the main exhibit hall in the Microsoft Booth Theater (#217) giving a presentation on contributing to Microsoft Docs, at 530pmPT. Hope to see you there. If prizes are available to give away, you know I'll give em up.

On Thursday afternoon, Christine and I are presenting together on a topic that is important to us, Ethics in Modern Data. I bring my years as a data professional and Christine brings her experience and a Masters degree in Organization Psychology, combined with our passion for history, civil rights, and technology. We'll be discussing issues ripped from the news headlines and in history. These will frame a discussion about bias in data collection and analysis, and our responsibilities as data professionals. I hope we spark your interest in these important topics that shape our data industry.

On Friday morning, I'll be presenting a full hour on Database Docs. We'll discuss how they work behind the scenes and how you can contribute to open-source docs via GitHub. This will also be an interactive feedback panel for the entire Docs platform inside Microsoft Learn. Hope to see you there, inspire you to contribute to the Docs that the entire data community uses daily, and answer any questions.

Thursday, June 02, 2022

Speaking on Ethics in Modern Data at Improving Edge 2022

Christine and I are looking forward to speaking at the Improving Edge conference, co-presenting our session on Monday, June 6th. Our presentation on “Ethics in Modern Data” features topics relevant to modern developers and data professionals, using historical and current events to discuss ethics in data collection and analysis.

This is an important topic that lives at the crossroads of our careers, Christine's career in organizational psychology and human resources, my career in data, our work and volunteerism in civic non-profits, and our joint passion for history and civil rights. It's important to understand that when dealing with bias: outcomes matter, intentions don't.

Our slide deck, references, and citations are available for download.

Thursday, September 16, 2021

Ethics in Modern Data at Music City Tech 2021

On Sept 16, Christine and I presented our joint presentation on Ethics in Modern Data at the Music City Tech 2021 Virtual Conference. Thanks for joining our presentation!

This session explores a variety of considerations that modern data scientists and data practitioners must account for when gathering and presenting data, including topics on bias, construct analysis, and machine learning. We'll discuss examples from history and current headlines.

Our slidedeck and all citations and references will be made available for download here.

Monday, August 02, 2021

Ethics in Modern Data this Saturday with the South Florida Data Geeks

This Saturday Aug 7, Christine and I will present our joint presentation on Ethics in Modern Data.

This session will explore a variety of considerations that modern data scientists and data practitioners must account for when gathering and presenting data, including topics on bias, construct analysis, and machine learning. We'll discuss examples from history and headlines.

This is an important topic that lives at the crossroads of our careers, my wife's career in organizational psychology and human resources, my career in data, our work in civic non-profits, and our joint passion for history and civil rights. It's important to understand that when dealing with bias: outcomes matter, intentions don't. While many of our examples come from the historical context of the United States, not all, and we have added additional context for international audiences.

Our slidedeck and all citations and references will be made available for download here.

Tuesday, July 27, 2021

Ethics in Modern Data at the Baton Rouge Analytics and Intelligence Network (BRAIN)

On July 27, Christine and I will present our joint presentation on Ethics in Modern Data. Looking forward to speaking at yet another technical user community looking to jumpstart after a COVID-induced hiatus.

Our slidedeck and all citations and references will be made available for download here.

Wednesday, April 07, 2021

"Ethics in Modern Data" at the Inland Northwest Data Professionals Association

Looking forward to speaking to one of our new home turf's data organizations, the Inland Northwest Data Professionals Association. My spouse Christine and I will be presenting a talk we're both passionate about and bring our career focuses to the topic of data, construct analysis, historical and current bias, and modern machine learning algorithms. We'll talk about ethics of bias in data both historic and ripped from the headlines.

Join us at NOON PT on April 8 here: https://www.meetup.com/inland-northwest-data-professionals-association/events/277102053/

Slidedeck available for download here.

Wednesday, September 23, 2020

Confounding Variables from Historical Bias

Note: co-authored with Christine Assaf, originally published in the now-defunct PASS Blog.

Historical data analysis that is naïve to past discrimination is doomed to parrot bias. How do we combat bias in our data analytics?

First, we already know that complex, multi-talented teams are best-suited to face complex, multi-faceted problems. A more diverse group of researchers and engineers is more likely to recognize and think harder about bias problems that may impact them personally. “It’s difficult to build technology that serves the world without first building a team that reflects the diversity of the world,” wrote Microsoft President Brad Smith in Tools and Weapons (2019).

Second, data collection is often provided to us through real-world interactions, subject to real-world bias. No real-world data exists in a vacuum. We should understand that bias in data collection and analysis may be inevitable but is not acceptable. It is not your responsibility to end bias (though that’s a worthy cause), but rather to be proactively informed and transparent. 

Look for systemic outcomes, not intentions. Only in outcomes are potential disparate impacts measured. Let’s review a couple of examples.

Many Americans are familiar with the ACT test, an exam many students take as part of the college application process.  In 2016, ACT admitted an achievement gap in composite scores based on family income. According to ACT’s own data, there is a 3-4 point difference in scores between poorer and wealthier households, and the gap continues to widen.

Credit to ACT for disclosing their research. Transparency is part of accounting for bias in historical data and data collection and is critically important to furthering a larger conversation about inequality of opportunity.

Recently, more than 1,000 American institutions of higher learning have adopted test-optional admissions policies, meaning they no longer ask for the ACT (or SAT, a similar exam) on applications. An overwhelming amount of studies have been conducted suggesting that the ACT does NOT predict college graduation outcomes as strongly as other factors, including high school GPA and household income.

Knowing the variables involved in your analysis is important. When conducting analysis, researchers must review, identify, and anticipate variables. You will never find the variables unless you are looking for them.  

This is why a proposed new rule by the United States Dept of Housing and Urban Development in 2019 stirred a massive reaction from technologists and ethicists alike. The proposed rules, yet to be implemented, would make it nearly impossible for a company to be sued when racial minorities are disproportionately denied housing, as mortgage lenders or landlords could simply blame their algorithm to avoid penalty.

Centuries of racially-divided housing policies in the United States evolved into legalized housing discrimination known as redlining, ensconced in federal-backed mortgage lending starting in the 1930s. The long history of legal racial housing discrimination in the United States was arguably not directly addressed until the 1977 Community Reinvestment Act. Yet today, 75% of neighborhoods “redlined” on government maps 80 years ago continue to struggle economically, and minority communities continue to be denied housing loans at rates far higher than their white counterparts. If discriminatory outcomes in housing are to change for the better, the algorithmic orchestration of mortgage lending should not be excused from scrutiny. 

In both cases, we examined industries where algorithms draw from data revealing larger societal outcomes. These outcomes are the result of trends of economic inequality and a pervasive opportunity gap. Are such data systems to be trusted as the result of an algorithm, and thereby inherently ethical? No, not without scrutiny. 

These provide examples of when data professionals must be aware of the historical and societal contexts from which our data is drawn, and how the outcomes of our data findings could be leveraged. Would our outcomes contribute to justice?  For example, the industries of financial marketing, healthcare, criminal justice, education, or public contracting have histories checkered with injustice. We should learn that history.

Transparency is desirable. It is needed to aid an informed societal conversation. We should not that assume an algorithm can overcome historical biases or other latent discrimination in its source data. Informed scrutiny should gaze upon historical data with a brave and honest eye.

Wednesday, August 26, 2020

Measure Ethical Data Analysis by Outcomes, Not Intent

Note: co-authored with Christine Assaf, originally published in the now-defunct PASS Blog.

In January 2020, Robert Williams was arrested on his front lawn in a suburb of Detroit, Mich., after police scanned state driver’s license photos and matched his face to grainy surveillance camera footage. He was held for 30 hours in police custody on suspicion of theft of 5 luxury watches, months earlier. He was innocent. In the resulting lawsuit and public scrutiny, the Detroit Chief of Police admitted in a public meeting in June 2020 that their facial recognition software would misidentify suspects 95-97% of the time.

A 2019 NIST study found larger error rates for darker-skinned samples across 99 different facial recognition software providers, including those used by the Michigan State Police software package.

In a 2017 MIT study, leading facial recognition software technologies provided by Microsoft, Face++, and IBM were put to the test on front-facing portrait style headshots. Their accuracy was lowest when evaluating darker-skinned faces. For example, in the case of IBM’s Watson, there was a 34.4% difference in error rate between lighter males and darker females, while 93.6% of Azure Face API’s gender identification failures were on darker-skinned subjects.

The reasons for the embarrassing inaccuracy are being pursued, and could be speculated, from the photo contrast levels to an unrepresentative data set used to train the models. However, it is instead the inaccurate outcomes that are the cause for concern. The potential for a civil rights nightmare is clear and present.

Robert Williams, the Michigan man arrested based on a faulty match, was black. In many Western countries including the United States, communities of color are more scrutinized by the criminal justice system. There is identifiable discrimination based on criminal justice outcomes, including ethnic and gender biases in sentencing, especially when judges have discretion.

Data collection and analysis without regard to the potential for disparate outcomes may reinforce and institutionalize a society’s discriminatory history. Appropriately, Microsoft, IBM, and Amazon announced in June that they would no longer license their facial recognition platforms for use by law enforcement.

Ethical outcomes don’t care about your intentions. Systems can only be judged by their impact, not by intentions vulnerable to revisionism and disconnected from outcomes. Algorithms, as we in the data community well understand, are not infallible, incorruptible oracles. We know better.

When we presented on this topic in May 2020 at a virtual conference, we were asked somewhat incredulously if we favored regulation of software development.

Frankly, our answer is yes.

Engineers who make bridges need a regulated licensing structure around a Professional Engineer stamp, because we don’t want bridges collapsing. Doctors need board certifications to make sure people get evidence-based medical care. From the Therac-25 software bug to the Boeing 737 MAX, perhaps we as data professionals specifically, and software developers generally, do need some regulation. Or, at least, ethical commitments to practice.

Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence and a professor at the University of Washington, has proposed just that: a modified version of a medical doctor’s Hippocratic Oath. He proposed a Hippocratic Oath for artificial intelligence practitioners, with this statement at its heart: “I will consider the impact of my work on fairness both in perpetuating historical biases, which is caused by the blind extrapolation from past data to future predictions, and in creating new conditions that increase economic or other inequality.”

There is considerable need for this type of commitment to our quickly evolving use of modern datasets. Colin Allen, a researcher in Cognitive Science and History at the University of Pittsburgh, summarized the need nearly a decade ago: “Just as we can envisage machines with increasing degrees of autonomy from human oversight, we can envisage machines whose controls involve increasing degrees of sensitivity to things that matter ethically. Not perfect machines, to be sure, but better.”

Friday, May 01, 2020

Data Community #DataWeekender Europe 2020: Ethics in Modern Data

On Saturday May 2 Christine and I presented at the Data Community #DataWeekender Europe 2020 online conference, an all-online free virtual popup Microsoft Data Conference hosted by various professionals from around Europe. Our joint presentation is one of our favorites to present, Ethics in Modern Data. It was awesome to present to the international audience of hundreds of data professionals, and are grateful for the final time slot that put us in mid-morning Saturday, Central US Time!

This is an important topic that lives at the crossroads of both of our careers, my wife's career in organizational psychology and human resources, my career in data, and our joint passion for history and civil rights. It's important to understand that when dealing with bias, outcomes matter, intentions don't. While many of our examples come from the historical context of the United States, not all, and we have added addition context to an international audience.

Link to our presentation slidedeck with references is here: https://github.com/williamadba/Public-Presentations/blob/master/DataWeekender%20Europe%202020/Ethics%20in%20Modern%20Data.pptx Thanks to UserGroup.tv, a previous recording of our presentation is available here.

Saturday, February 08, 2020

Ethics in Modern Data presentation at SQLSaturday Austin BI Edition 2020

Excited to launch another new presentation, the first co-presented with my wife, on Ethics in Modern Data. We'll explore the ethical considerations around historical bias, ethics in analysis and data collection, and disparate impact, with tons of well-documented case studies and examples.

This is an important topic that lives at the crossroads of both of our careers, my wife's career in organizational psychology and human resources, my career in data, and our joint passion for history and civil rights. The effort of researching, paring down, and rehearsing our presentation together as a couple has been an exciting first for us. It's important to understand that when dealing with bias, outcomes matter, intentions don't.

Thanks to the many of you who chimed in during the presentation, including with further reading and book recommendations for us all!

If you'd review any of the topics or case studies we covered, our slides and citations is available for download here.

Thanks to UserGroup.tv, a recording of our presentation is available here.

My wife also presented at Austin SQLSaturday BI 2020 on "Mastering your Resume & Interview: Tips to Get Hired" Saturday afternoon.

Ethics in Data @HRTact @william_a_dba #sqlsataustinBI pic.twitter.com/pf8r4Hpx3I
— Alicia Moniz (@AliciaMoniz) February 8, 2020