I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing).
With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL.
Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.
The statement “this [guy] thinks the government uses SQL” demonstrates a complete and total lack of knowledge as to what SQL even is. Every government on the planet makes extensive and well documented use of it.
The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.
If he knew the domain, he would know this isn’t an issue. If he knew the technology he would be able to see the constraint and following investigation, reach the conclusion that it’s not an issue.
The man continues to be a malignant moron
The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.
Since SSNs are never reused, what would be the purpose of using the SSN and birth date together as part of the primary key? I guess it is the one thing that isn’t supposed to ever change (barring a clerical error) so I could see that as a good second piece of information, just not sure what it would be adding.
Note: if duplicate SSNs are accidentally issued my understanding is that they issue a new one to one of the people and I don’t know how to find the start of the thread on twitter since I only use it when I accidentally click on a link to it.
https://www.ssa.gov/history/hfaq.html
Q20: Are Social Security numbers reused after a person dies?
A: No. We do not reassign a Social Security number (SSN) after the number holder’s death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system.
Take this with a grain of salt as I’m not a dev, but do work on CMS reporting for a health information tech company. Depending on how the database is designed an SSN could appear in multiple tables.
In my experience reduplication happens as part of generating a report so that all relevant data related to a key and scope of the report can be gathered from the various tables.
A given SSN appearing in multiple tables actually makes sense. To someone not familiar with SQL (i.e. at about my level of understanding), I could see that being misinterpreted as having multiple SSN repeated “in the database”.
Of all the comments ao far, I find yours the most compelling.
Theoretically, yeah, that’s one solution. The more reasonable thing to do would be to use the foreign key though. So, for example:
SSN_Table
ID | SSN | Other info
Other_Table
ID | SSN_ID | Other info
When you want to connect them to have both sets of info, it’d be the following:
SELECT * FROM SSN_Table JOIN Other_Table ON SSN_Table.ID = Other_Table.SSN_ID
EDIT: Oh, just to clear up any confusion, the SSN_ID in this simple example is not the SSN itself. To access that in this example query, it’d by SSN_Table.SSN
This is true, but there are many instances where denormalization makes sense and is frequently used.
A common example is a table that is frequently read. Instead of going to the “central” table the data is denormalized for faster access. This is completely standard practice for every large system.
There’s nothing inherently wrong with it, but it can be easily misused. With SSN, I’d think the most stupid thing to do is to use it as the primary key. The second one would be to ignore the security risks that are ingrained in an SSN. The federal government, being large as it is, I’m sure has instances of both, however since Musky is using his possy of young, arrogant brogrammers, I’m positively certain they’re completely ignoring the security aspect.
To be a bit more generic here, when you’re at government scale you’re generally deep in trade-off territory. Time and space are frequently opposed values and you have to choose which one is most important, and consider the expenses of both.
E.g. caching is duplicating data to save time. Without it we’d have lower storage costs, but longer wait times and more network traffic.
Yeah, no one appreciates security.
I probably overused that saying to explain it: ‘if theres no break ins, why do we pay for security? Oh, there was a break in - what do we even pay security for?’
Yeah, databases are complicated and make my head hurt. Glancing through resources from other comments, I’m realizing I know next to nothing about database optimization. Like, my gut reaction to your comment is that it seems like unnecessary overhead to have that data across two tables - but if one sub-dept didn’t need access to the raw SSN, but did need access to less personal data, j could see those stored in separate tables.
But anyway, you’re helping clear things up for me. I really appreciate the pseudo code level example.
It’s necessary to split it out into different tables if you have a one-to-many relationship. Let’s say you have a list of driver licenses the person has had over the years, for example. Then you’d need the second table. So something like this:
SSN_Table
ID | SSN | Other info
Driver_License_Table
ID | SSN_ID | Issue_Date | Expiry_Date | Other_Info
Then you could do something like pull up a person’s latest driver’s license, or list all the ones they had, or pull up the SSN associated with that license.
I think a likely scenario would be for name changes, such as taking your partner’s surname after marriage.
It is common for long lived databases with a rotating cast of devs to use different formats in different tables as well! One might have it as a string, one might have it as a number, and the other might have it with hyphens in the same database.
Hell, I work in a state agency and one of our older databases has a dozen tables with databases.
- One has the whole thing as a long int: 222333444
- One has the whole thing as a string: 2223334444 (which of course can’t be directly compared to the one that is a long int…)
- One has separate fields for area code and the rest with a hyphen: 222 and 333-4444
- One has the whole thing with parenthesis, a space, and a hyphen as a string: (222) 333-4444
The main reason for the discrepancy is not looking at what was used before or not understanding that they can always change the formatting when displayed so they don’t need to include the parenthesis or hyphens in the database itself.
There can be duplicate SSNs due to name changes of an individual, that’s the easiest answer. In general, it’s common to just add a new record in cases where a person’s information changes so you can retain the old record(s) and thus have a history for a person (look up Slowly Changing Dimensions (SCD)). That’s how the SSA is able to figure out if a person changed their gender, they just look up that information using the same SSN and see if the gender in the new application is different from the old data.
Another accusation Elon made was that payments are going to people missing SSNs. The best explanation I have for that is that various state departments have their own on-premise databases and their own structure and design that do not necessarily mirror the federal master database. There are likely some databases where the SSN field is setup to accept strings only, since in real life, your SSN on your card actually has dashes, those dashes make the number into a string. If the SSN is stored as a string in a state database, then when it’s brought over to the federal database (assuming the federal db is using a number field instead of text), there can be some data loss, resulting in a NULL.
JFC: married individuals, or divorced and name change back, would be totally fucked. Just on the very surface is his fuckery.
Hypothetically you could have a separate “previous names” table where you keep the previous names and on the main table you only keep the current name. There are a lot of ways to design a db to not unnecessarily duplicate SSNs, but without knowing the implementation it’s hard to say how wrong Musk is. But it’s obvious he doesn’t know what he’s talking about because we know that due to human error SSN-s are not unique and you can’t enforce uniqueness on SSN-s without completely fucking up the system. Complaining about it the way he did indicates that he doesn’t really understand why things are the way they are.
Another accusation Elon made was that payments are going to people missing SSNs.
A much simpler answer is thatnot all Americans actually have an SSN. The Amish for example have religious objections towards insurance, so they were allowed to opt out from social security and therefore don’t get an SSN.It’s true that some Americans don’t have Social Security numbers, but those Americans can’t collect Social Security benefits unless/until they get one.
My bad, I thought it was about payments in general (including other programs) but it says social security database. Sorry.
Having never seen the database schema myself, my read is that the SSN is used as a primary key in one table, and many other tables likely use that as a foreign key. He probably doesn’t understand that foreign keys are used as links and should not be de-duplicated, as that breaks the key relationship in a relational database. As others have mentioned, even in the main table there are probably reused or updated SSNs that would then be multiple rows that have timestamps and/or Boolean flags for current/expired.
Is this is true, then by this time we are all fucked. Like Monday someone checks their banking or retirement and it all gone. That’s gonna be a crazy day.
I hope they’re not using the actual SSN as the primary key. I hope its a big ass number that is otherwise unrelated.
It’s so basic that documentation is completely unnecessary.
“De-duping” could mean multiple things, depending on what you mean by “duplicate”.
It could mean that the entire row of some table is the same. But that has nothing to do with the kind of fraud he’s talking about. Two people with the same SSN but different names wouldn’t be duplicates by that definition, so “de-duping” wouldn’t remove it.
It can also mean that a certain value shows up more than once (eg just the SSN). But that’s something you often want in database systems. A transaction log of SSN contributions would likely have that SSN repeated hundreds of times. It has nothing to do with fraud, it’s just how you record that the same account has multiple contributions.
A database system as large as the SSA has needs to deal with all kinds of variations in data (misspellings, abbreviations, moves, siblings, common names, etc). Something as simplistic as “no dupes anywhere” would break immediately.
SSN is also not a valid unique key, there have been situations with multiple people issued the same SSN:
Just read the format of the us ssn in that wikipedia. That wasnt a smart format to use lol. Only supports 99*999 ( +/- 100k ) people per area code. No wonder numbers are reused.
In some countries its birthday+sequence number encoded with gender+checksum and that has been working since the 80’s.
Before that was a different number, but it wasnt future proof like the us ssn so we migrated away in the 80’s :')In my country the only way that someone has the same number is if someone was born on the same day (±1 century), in the same city and has the same name and family name. Is extremely difficult to have duplicates in that way (exception: immigrants, because the “city code” is the same for the whole foreign country, so it’s not impossible that there are two Ananya Gupta born on the same day in the whole India)
Oh ye, our system wouldnt fit india as its limited to 500 births a day ( sequence is 3, digits and depending if its even or uneven describes your gender ). Your system seems fine to me and beats the us system hands down haha
It’s more than just SQL. Social Security Numbers can be re-used over time. It is not a unique identifier by itself.
i’ve heard conflicting reports on this, i have no idea to what degree this is true, but i would be cautious about making this statement unless you demonstrate it somehow.
As read on wikipedia ( https://en.wikipedia.org/wiki/Social_Security_number ) the format only allows +/- 100k numbers per area code ( which is also limited to 999 codes? ), so over time you are forced to reuse some codes. In total the format allows 99m unique codes, and the us currently has 334mil people sooooo :')
TL;DR de-deuplication in that form is used to refer a technique where you reference two different pieces of data in the file system, with one single piece of data on the drive, the intention being to optimize file storage size, and minimize fragmentation.
You can imagine this would be very useful when taking backups for instance, we call this a “Copy on Write” approach, since generally it works by copying the existing file to a second reference point, where you can then add an edit on top of the original file, while retaining 100% of the original file size, and both copies of the file (its more complicated than this obviously, but you get the idea)
now just to be clear, if you did implement this into a DB, which you could do fairly trivially, this would change nothing about how the DB operates, it wouldn’t remove “duplicates” it would only coalesce duplicate data into one single tree to optimize disk usage. I have no clue what elon thinks it does.
The problem here, as a non programmer, is that i don’t understand why you would ever de-duplicate a database. Maybe there’s a reason to do it, but i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another, or what elon is implying here (remove “duplicate” entries, however that’s supposed to work)
Elon doesn’t know what “de-duplication” is, and i don’t know why you would ever want that in a DB, seems like a really good way to explode everything,
i genuinely cannot think of a single instance where you would want to delete one entry, and replace it with a reference to another
Well, there’s not always a benefit to keeping historical data. Sometimes you only want the most up-to-date information in a particular table or database, so you’d just update the row (replace). It depends on the use case of a given table.
what elon is implying here (remove “duplicate” entries, however that’s supposed to work)
Elon believes that each row in a table should be unique based on the SSN only, so a given SSN should appear only once with the person’s name and details on it. Yes, it’s an extremely dumb idea, but he’s a famously stupid person.
Ssn being unique isnt a dump idea, its a very smart idea, but due to the us ssn format its impossible to do. Hence to implement the idea you need to change the ssn format so it is unique before then.
Also, elons remark is stupid as is. Im sure the row has a unique id, even if its just a rowid column.
Billionaires are stealing our dollars, tax or otherwise.
To me I’m not really sure what his reply even means. I think it’s some attempt at a joke (because of course the government uses SQL), but I figure the joke can be broken down into two potential jokes that fail for different, embarrassing reasons:
Interpretation 1: The government is so advanced it doesn’t use SQL - This interpretation is unlikely given that Elon is trying to portray the government as in need of reform. But it would make more sense if coming from a NoSQL type who thinks SQL needs to be removed from everywhere. NoSQL Guy is someone many software devs are familiar with who takes the sometimes-good idea of avoiding SQL and takes it way too far. Elon being NoSQL Guy would be dumb, but not as dumb as the more likely interpretation #2.
Interpretation 2: The government is so backward it doesn’t use SQL - I think this is the more likely interpretation as it would be consistent with Elon’s ideology, but it really falls flat because SQL is far from being cutting-edge. There has kind of been a trend of moving away from SQL (with considerable controversy) over the last 10 years or so and it’s really surprising that Elon seems completely unaware of that.
My guess is that he thinks SQL is an app or implementation like MS-SQL. It would be pretty surprising if the government didn’t use SQL as in relational databases, but if it doesn’t it’s even more unlikely that he understands even the first part over whether having duplicate SS numbers is in any way unexpected or unreasonable. Most likely one of the junior devs somewhere along the lines misunderstood a query and said something uninformed and mocking, and he took that as a good dig to toss into a tweet.
Thanks for genuine response. Lol, most who interpret my question that way you did don’t seem interested in a good faith discussion. But ol’ boy is def tripping if he thinks SQL isn’t used in the government.
Big thing I’m intending to pry at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
it’s probably using some sort of proprietary home grown database, because it’s probably old enough that no database could support what they needed, could be wrong on that one, but it was my best guess.
It’s an insanely idiotic thing to say. Federal government IT is myriad, and done at a per agency level. Any relational database system, which the federal government uses plenty of, uses SQL in one way or another. Elon doesn’t know what he is talking about at all, and is being an ultimate idiot about this. Even in the context of mainframe projects thatif we are giving elong the benefit of doubt about referring to, most COBOL shoprbibknow have adapted to addressing internal data records using an SQL interface, although obviously in that legacy world it is insanely fractured and arcane.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
Another commentor pointed out a legitimate use case, but it’s not even worth thinking about that much. De-duplocated is usually a word you use in data science to talk aboutakong sure your dataset is “hygienic” and that you aren’t duplicating data points. A database is much different because it is less about representing data, and more about storing it in a way that allows you to perform transactions at scale - retrieval, storage, modification, etc. Relational databases are analyzed in terms of data cardinality which essentially describes tradeoffs in representation between speed of retrieval (duplications good) vs storage efficiency (duplications bad).
The issue is that Elon is so vague and so off the mark that it is very hard to believe that he even has the first clue about what he is a talking about. Even you are confused just by reading it. It is all a tactic to convince others that he is smarter than he is while doing extreme damage to the hardworking people that actually make this stuff possible. Have you noticed that the man has never come to a conclusion that wasn’t in his interests? This is not honest intellectualism, or discussion based on technical merit. It’s self serving propaganda.
Well, if someone changes their name you’d add a new record with the same SSN to hold their new name, that way it keeps the records consistent with the paperwork; old papers say their old name and reference the retired record, new papers use their new name and reference the new record.
You can use the SSN as the key to find all records associated with a person, it doesn’t have to be a single row per SSN, in fact that would make the data harder to manage and less accurate.
E.g. if someone changes their last name after getting married, it could be useful to be able to have their current and former name in the database for reference.
If there are timestamped records for things like name changes then you’d get “duped” SSNs
“The government” is multiple agencies and departments. There is no single computer system, database, mainframe, or file store that the entire US goverment uses. There is no standard programming language used. There is no standard server configuration. Each agency is different. Each software project is different.
When someone says the government doesn’t use sql, they don’t know what they are talking about. It could be refering to the fact that many government systems are ancient mainframe applications that store everything in vsam. But it is patently false that the government doesn’t use sql. I’ve been on a number of government contracts over the years, spanning multiple agencies. MsSQL was used in all but one.
Furthermore, some people share SSNs, they are not unique. It’s a common misconception that they are, but anyone working on a government software learns this pretty quickly. The fact that it seems to be a big shock goes to show that he doesn’t know what he is doing and neither do the people reporting to him.
Not only is he failing to understand the technology, he is failing to understand the underlying data he is looking at.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the Vice Bro doesn’t understand how SQL works).
I’m not aware of any instance where two people share an SSN though. The Social Security Administration even goes as far as to say they don’t recycle the SSNs of dead people (its linked a couple times in other comments and Voyager doesn’t let me save drafts of comments, I’ll make an edit to this comment with that link for you).
Can you point me to somewhere showing multiple people can share an SSN?
Edit: as promised: The Social Security FAQ page
Assuming the whole “duplicate SSN” thing isn’t just a complete fabrication, we have no idea what table he was even looking at! A table of transactions e.g. would have a huge number of duplicate SSNs.
The fact that SSN aren’t singular identifiers has been public knowledge for quite a while. ID analytics has shown in over a decade of studies that some people have multiple SSN attached to their name, while some (over five million) SSN are used by three or more living individuals. If you search “ID analytics SSN” you’ll find loads of articles reporting on this dating back to 2010 and a bit before.
Because SQL is everywhere. If Musk knew what it was, he would know that the government absolutely does use it.
This explanation makes no sense in the context of OP’s question, given the order of comments…
139 comments and no one addresses his use of a slur.
Because that’s really just to be expected at this point, and what his audience would want…
Better to focus on constantly poking at him for being dumb, which he and his fans hate, rather than give them what they want, ie being upset at their hateful language
it seems that nobody really cares about the word retard anymore, it’s quite funny how it went from super common language, to being less common, to people just saying it again now.
I’m curious how many people actually consider the word a slur, and how many people even care these days.
Musk’s statement about the government not using SQL is false. I worked for FEMA for fourteen years, a decade of which was as a Reports Analyst. I wrote Oracle SQL+ code to pull data from a database and put it into spreadsheets. I know, I know. You’re shocked that Elon Musk is wrong. Please remain calm.
Yeah, obviously ol’ boy is tripping if he thinks SQL isn’t used in the government.
Big thing I’m prying at is whether there would be a legitimate purpose to have duplicated SSNs in the database (thus showing the First Bro doesn’t understand how SQL works).
The ignorance of Elon is truly concerning, but somehow the worst part to me is Elon calling someone a retard for pointing that out.
He called a rescuer a pedophile for trying to rescue children…
After his own offer to rescue the children was turned down
Ableist, racist white supremacist doing their ableist-racist-white-supremacist thing.