SQL Tact: actualemails

Showing posts with label actualemails. Show all posts

Wednesday, June 05, 2019

Actual Emails: Will MSOLEDB work for connecting to an Availability Group?

Scenario:

We need to configure an existing legacy application from an external vendor to talk to our new SQL Server 2017 Availability Group, which spans multiple subnets. In the end, that last bit is the key. The old data provider MSOLEDB will work for connecting to single-subnet Availability Group listeners, but won't work consistently when connecting to a multisubnet Availability Group's listener. The key is the ability to specify MultiSubNetFailover=True in the connection parameters.

Client's software vendor:

The connection used is the Microsoft OLE DB Provider for SQL Server that is supplied by Microsoft to create the Data Link Properties. When configuring our Data Link, we use the "Microsoft OLE DB Provider for SQL Server". The connection string is formatted:

Provider=SQLOLEDB.1;Password="whatever";Persist Security Info=True;User ID=username;Initial Catalog=Test;Data Source=ServerName

Answer:

Good info, but we do need to make a change here. SQLOLEDB is the provider from back in the SQL 2000 era. Do not recommend its use for new development. It has been replaced by the Native Client (SNAC), which has since been replaced by MSOLEDBSQL (I linked below). It should be easy and transparent to upgrade the provider from SQLOLEDB with no negative impact.

Here’s why we need to upgrade the data provider to talk to our SQL Availability Group. The SQL Server Listener for a multi-subnet Availability Group actually has two IP’s. When you perform at a command line:

Nslookup SQLListenerNameWhatever

You get back an IP in each subnet (in our case, two), for each replica SQL instance in the Availability Group.

When a connection string uses MultiSubNetFailover=True and connecting to the Availability Group Listener name (not the IP or either SQL Server instance name), BOTH IP’s are tried simultaneously and immediately, and the driver talks only to the IP that replies: the primary replica. After a failover, the other IP begins to reply immediately, and so there is no delay in reconnectivity when a failover occurs.

Without specifying MultiSubNetFailover=True, your application will (essentially randomly) pick ONE of the two IP’s for the Listener, and try it. There is no way to “rig” one IP to be first consistently over time. If it picks the primary replica, everything works! If it picks the IP for the current secondary replica… your application’s connection timeout will have to expire and then try the next IP. This is why I’m bringing this up – the application will timeout upon SQL login without MultiSubNetFailover=True.

This hasn’t been an issue with your other clients if they aren’t using a multisubnet availability group. If they have an Availability Group all inside only one subnet, then the Listener only has one IP in DNS, and MultiSubNetFailover=True isn’t required.

You should be fine to install the MSOLEDBSQL provider released in 2018 and use that in your data link. Obviously it should be tested for due diligence, but it should work. At the very least, you could try instead the SQL Native Client 11 (SQLNCLI11), which was released for SQL Server 2012, and it also should work just fine for both OLEDB or ODBC.

Let me know if you have any more questions.

Connection string information for SQL Server Availability Groups:

The SQL Native Client (SNAC) Sqlncli10 will NOT work for connecting to a multisubnet Availability Group because it cannot specify MultiSubNetFailover=True. Must use the newer SQLNCLI11, however, the SQL Native Client (SNAC) is no longer maintained by Microsoft, so you should instead use the new discrete OLE DB or ODBC drivers.

Recommend convert your connection strings from sqlncli1n or sqloledb, to MSOLEDBSQL. OLEDB has been un-deprecated via the new MSOLEDBSQL provider released in 2018. Download the new driver here if needed: https://www.microsoft.com/en-us/download/details.aspx?id=56730 (Note that of the two identically-named files, the 64-bit one is the larger file size of the two.)

There is also the ODBC Driver 17 for SQL Server provider, and the newer ODBC Driver 18 for SQL Server.

In SSIS, the new MSOLEDBSQL OLE DB driver appears as "Native OLE DB\Microsoft OLE DB Driver for SQL Server". The old OLE DB driver is "Native OLE DB\Microsoft OLE DB Provider for SQL Server".

Note: you also need to make sure your Windows Cluster has RegisterAllProviderIP's set to ON for a multisubnet Availability Group!

Edit March 2023: Added latest OLE DB/ODBC information.

Monday, April 15, 2019

How do I learn SQL Server despite limited SQL duties at work?

Got this email from a client in the southern US asking how to up their game in SQL Server, frustrated by a lack of hands-on opportunities to administer SQL via current job duties. I also felt it necessary to discuss whether or not cert exams were appropriate, leaving it up to them, and then my preferred training methods.

To: William

Subject: SQL Certification Questions

I am trying to gain a deeper knowledge to go and sit for the certification exams. What would be your best suggestion to immerse myself into SQL and learn the skills I need to sit for each exam. I have tried to just create tasks to force myself to learn and practice SQL query skills and that works but it has some limits. I learned a lot from that, but I learned so much more with some direction and a course structure. I have taken a couple of online SQL training courses and everything seems simple and logical. I start feeling like less confident when I look at sample questions. They seem to go into deeper detail than what I have seen in the training classes and deeper than what I see day to day. What would your suggestion be for the best method of gaining the skills needed to effectively manage my SQL environment?

From: William

Hi! Honored that you reached out, hope this email helps. I actually give a presentation on this topic, based on my experience as a writer for the last three generations of SQL certification exams for Microsoft. The exam writers are instructed to test experience by asking questions in the frame of tasks that test whether or not the exam taker has “do this job” before. The exam writers are told to test for someone with 3-5 years of experience at a minimum, by testing things you can’t learn from only reference docs (and especially on brand new features of the latest version of SQL). So that’s who I advise taking the tests: 3-5+ years of xp.

So my opinion here may be different from others and especially from some managers, but I don’t feel it’s appropriate or productive to ask inexperienced resources to pursue exams. What’s more likely than a passing score is a person becoming disillusioned, frustrated, or disengaged from career progress, or they try to cheat (with brain dumps), or they quit and/or change career path.
Given that, gauge for yourself whether you think an exam makes sense for you at this stage. Regardless, as for training resources:Again experience is the best teacher here, but I understand the frustration about not being exposed to much variety, as far as SQL development/administration goes. This blog post of mine has links to many resources, I’d point out specifically the “Stairways” and ~~the MVA~~Microsoft Learn. If you don’t already have a copy of my book, the Sparkhound office near your area can totally arrange that. Joining your local SQL User Group is good free training, as well as all the virtual user groups that PASS provides for free. Highly recommend joining PASS, it’s free. There’s a SQLSaturday conference in Atlanta next weekend, again more free training, later this year there are SQLSaturdays in Memphis and Baton Rouge and Pensacola, and there are SQLSaturdays that usually happen annually in Birmingham and Columbus, GA/Phenix City, AL (thought I don’t see either in the upcoming events list on sqlsaturday.com right now…) The wife and I like to make little weekend trips out of those Saturday conferences, and before our kid went off to college, we’d bring them along too for fun.
As for trying to get hands-on experience, I would give you the same advice I often give my team: “lab it out.” Use your local workstation or laptop, install SQL Server Developer edition, have a local instance to play with at all times. Sign up for an Azure account and use your free credits to run an Azure VM with SQL Developer edition, or if you have an MSDN account through work or otherwise, you get monthly credits for Azure spend. If Azure isn’t an option, use a home PC or server, the hardware doesn’t have to be production-quality to facilitate learning basic concepts and testing admin actions you shouldn’t try on production. I have learned and developed a lot of my toolbox “lab” scripts by just playing and breaking and fixing and dropping and recreating on my SQL sandboxes… all outside of production. So if you don’t have an admin sandbox, get one, just setting it up, breaking and fixing it will be a learning experience.

I can’t speak personally to any of the paid training classes, other than the SQLSkills folks literally wrote big parts of SQL Server and their training is considered top notch. The PASS Summit conference in Seattle is the biggest/best SQL Server conference with two days of all-day deep dive trainings followed by three long days of sessions, if you have that kind of training money.
Let me know if you have any questions and best of luck!

Tuesday, March 05, 2019

Backup of an Azure SQL DB downloaded outside of Azure

From a dev colleague: “Asking for a client... do you know how to get a backup of an Azure SQL DB downloaded outside of Azure?”

Short answer - You can’t…

Medium answer - … and you don’t really need to for anything related to DR or HA. Backups are automagically handled in Azure, Azure SQL DB georeplication can be set up in a few button clicks. Instead of thinking of Azure SQL DB as a traditional SQL database in a server, think of it as a platform, consider spinning up a stand-alone copy of the production Azure SQL DB in a cheaper tier to set as pre-production. It’d be a better, truer dev/test/qa environment as a result.

Long answers – Okay, so you really want to get the Azure SQL DB to an on-prem SQL Server instance. Sigh.

You can manually script it out into CREATE and INSERT T-SQL statements – schema and data – with various tools, including a compare with SSDT, or an officially supported free crossplatform scripter that is Python-based. Keep in mind this might be a huge file and a lengthy manual process with a large database.

You could also use the “Export” button on the Azure portal for the Azure SQL DB to create a BACPAC on an Azure storage account. The file is locked with a username/password which you specify at the time of export. It will take a few minutes, even for a small database. Once it's done, you can download the .bacpac file from the blob location with Azure Storage Explorer or something similar, and then import the .bacpac. Keep in mind that this might also be a huge file, and a lengthy manual process with a large database.

Tuesday, December 18, 2018

Actual Emails: What is a "soft delete"?

What's a "soft delete", and why should DBAs be aware of such behavior in tables?

So, a DELETE statement is a “hard” delete. The data is gone.

However, it is a common practice to build into tables a set of auditing fields…

For example:

CREATE TABLE dbo.whatever(
Id int identity(1,1) not null primary key,
WhateverInt int not null,
…
Createddate datetimeoffset(0) not null,
Createdby varchar(250) not null,
Modifieddate datetimeoffset(0) null,
Modifiedby varchar(250) not null,
IsActive bit not null CONSTRAINT DF_whatever_IsActive
DEFAULT (1)
)

The IsActive field, or anything similarly named like IsArchived or IsDeleted, is a common strategy to “soft” delete something and remove it from resultsets, but still retain history. All the queries on the table would then use WHERE IsActive = 1 to make sure they only viewed Active data, for example.

So, if you UPDATE a record to IsActive = 0 instead of DELETEing it, it disappears from queries and reports and screens, but it’s still there in case it’s useful. There may be some queries that intentionally want to query IsActive = 0 for historical or auditing purposes.

Often, a filtered nonclustered index (introduced in SQL 2008) can be used to match the WHERE IsActive = 1, and then the index is smaller as a result and allows SQL Server to dramatically increase performance here. This is really only effective if a majority of the data has actually been soft-deleted.

For example:

CREATE NONCLUSTERED INDEX IDX_NC_whatever_WhateverInt
ON dbo.whatever (WhateverInt) 
WHERE IsActive = 1

We had one client with an IsArchived flag (or similar) on a table with tens of millions of rows. Only 1% of the data was “active” and waiting to be archived. Their queries specified WHERE IsArchived=0, but still performed very poorly – lots of data still to be accessed. We added and modified some nonclustered indexes as filtered indexes to account for IsArchived = 0, and not only did this greatly reduce the size the indexes, but the query result time went from minutes to instantaneous. Huge difference. We could also have added similarly-filtered nonclustered columnstore indexes.

DBAs need to understand table design and recognize "soft delete" keys and the associated business logic in applications, because the Missing Indexes feature and other common query tuning tools won't include them in recommendations for filtering.

Furthermore, we can even enforce filtered uniqueness using a unique nonclustered index. In this way, we could make sure that there is only one active record for a given key set, but allow many inactive records for the same key set!

Sunday, August 23, 2015

Actual Emails: Allow NUNS to Lead You to Good Clustered Index Design

Much more on this topic at the Sparkhound blog.

Wrote this email exchange with a colleague who wanted to confirm that the client-proposed design for a table was... less than optimal.

Subject: clustered index
From: A. Developer

Hey William,
I believe the client created this Clustered Index... <horrifying screenshot of a clustered index with many large nvarchar columns as the key>

From: William

The clustered index is ideally

1) Non-changing
2) Unique
3) Narrow
4) Sequential

“NUNS”

Having those multiple nvarchar columns in the clustered index is probably not a good idea, as it violates #3 and probably #1 and #4 too.

The most ideal clustered index is on an integer identity column. You can modify the design of a table to add one of those.

Look at the data and suggest a new clustered index. That five-column clustered index might be a perfectly fine nonclustered index, but is an inefficient clustered key.

From: A. Developer

That makes sense. Thanks William!

Wednesday, February 05, 2014

Actual Emails: More Grief for GUIDs

Wrote this email exchange with some developer colleagues about to embark on performance tuning.

Subject: bag o' tricks from DMV talk
From: A. Developer
Hey William,
We are about to spend around 4 weeks performance tuning an application we built. Much of the performance tuning is going to be the database. Could you send over those dmv queries that you used in your DMV talk? Or do you still have them available for download? I think they will be able to help us out quite a bit.
I know one of the big things you talked about is not using GUIDs, however, we are using them because of replication. Do you have any thoughts on helping ease the fragmentation because of this?

From: William Assaf

Yep, I have all that info here. This is the last time I gave that talk: http://www.sqltact.com/2013/09/houston-tech-fest-2013-sql-admin-best.html
Of course, if you have any questions, let me know.

So, as you know, I hate GUIDs because they are 4x as large as an integer yet serve the same purpose, or 2x as large as two integer columns to allow for multiple device synchronization.

But the biggest problem with GUIDs can happen when the first key of the clustered index of a table is a GUID column. With the creation of the new GUIDs, are you doing random GUIDs or sequential GUIDS?

If you’re creating them with a SQL default value (which you’re probably not, but as an example), this would be the difference between the newid() function (bad) and the newsequentialid() function (not as bad).

Using sequential GUIDs can allow you to create a clustered index that actually make some sense when it is ordered, and can have index maintenance performed on it to REBUILD or REORGANIZE, reducing fragmentation. Problem is, when you restart the SQL Service, the sequence also resets. So you won't have one contiguous string of sequentially-inserted GUIDs in a table over its lifetime.

On random GUIDs, you can REBUILD or REORGANIZE all you want, but the data order still won’t make any sense. Changing from random to sequential GUIDs may be really easy to change in your database or application code. If you’re already using sequential GUIDs, there’s not really much more you can to do mitigate the performance and storage letdowns of GUIDs that you would not also do on tables that use integer IDENTITY columns.

As for basic indexing, run this script first to find any tables that are still heaps. Those should be the first things you target in your performance tuning phase. http://www.sqltact.com/2013/05/hunt-down-tables-without-clustered.html

If I can help or get in on that performance tuning stuff, let me know! Good luck, keep in touch.

From: A. Developer
Thanks for the info.
One question though. Will sequential GUIDs work with replication? Wouldn't there be collisions if they are sequential?

From: William

So, in theory, is it mathematically "unlikely" that GUIDs will have collisions. That math always troubles me when multiple machines are generating GUIDs, though the network card is used to generate a unique seed for the GUID on each machine. It is even more troublesome when GUIDs are generated sequentially, but we are supposed to trust the math. :/

I’m not super knowledgeable about that math but from what I know, your concern is understandable but “unlikely” to be an issue. I know, not comforting. Here is some comforting reference material though. http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/guid-vs-int-debate.aspx

Uses words like “practically guaranteed” and “virtually impossible” for sequential GUID collisions.

I welcome comments and additions to this somewhat religious topic. The "oil rig" scenario that is commonly faced in the Gulf South provides a recurring fresh argument for/against GUIDs. In many scenarios, a pair of integer fields can provide the benefits of sequential, unique keys. Assuming one of the pairs is also the rig/device/source of the data, you also have a built-in foreign key constraint in the unique key, something you would have to store anyway in a table keyed on a GUID.

Tuesday, September 03, 2013

Actual Emails: TL;DR: Stop using varchar(max)

Wrote this email recently to a crew of developers who were shooting themselves in the foot with a database rich in varchar(max) data types.

Hey folks-

TL;DR: Stop using varchar(max). We’re not storing books.

We need to review and avoid the varchar(max) data type in our tables. Here’s a short treatise as to why.

In SQL Server, varchar(max) is intended to replace the old text data type, which was different from varchar(n) because it was designed to store massive amounts of data. Massive being greater than 8000 characters, and all the way up to 2gb worth of data. That’s what the varchar(max), varbinary(max), and nvarchar(max) data types are optimized for – HUGE blocks of data in a single cell. We should only use it if we're actually intending to store massive text and use the fulltext indexing engine (a completely separate and specific topic for text blocks).

This is an oversimplification, but varchar(max) is designed to store data differently, and specially for large text blocks. It appears to behave the same as a varchar(n) field, and that’s deceptive when we are throwing 100-200 characters in each row field.

The big drawbacks biting us right now about varchar(max) have to do with indexing, and this is regardless of how much data is actually in a varchar(max) field. A varchar(max) column can’t be the key of a nonclustered index, even if it never stores more than 8000 characters, and can’t have ONLINE index maintenance performed. As a result, it is generally a giant pain for indexing, a pain you only want to put up with if you absolutely have to.

Furthermore, we’re doing ourselves a disservice for performance, straight up. Unless you’re storing books, (max) hurts performance. Check out this blog post: http://rusanu.com/2010/03/22/performance-comparison-of-varcharmax-vs-varcharn/

In short, varchar(max) is burdensome overkill for our datasets.

So here’s the solution… Change varchar(max) to varchar(n), where n is an generous but appropriate number for that column’s data. If Excel creates varchar(max) columns for us when performing a data import wizard, change them to varchar(8000), which is the highest number you can assign to a varchar field. Or better yet, once the data is in SQL, use this simple syntax to find out the max length of a column and then pad it.

For example: select MAX(LEN([yourcolumn])) from yourtable

Problem is, our SSIS packages are all very picky about the data types and will break if we just change the data types. So, after making these table changes, you’ll need to open your SSIS package, open the data flow destination or other object, hit OK to apply the new metadata, save and deploy it again. No actual changes necessary.

This all came up because we have data quality issues with the fields Foo and Bar. Both of those columns are varchar(max). I’m dumping the varchar(max) data into temp tables with varchar(200) to get the queries to return in a reasonable amount of time.

Let me know if you have any questions!
William

I like to use the word treatise to prepare my audience for verbosity.

Thursday, August 16, 2012

Actual Emails: The Customer Doesn't Want Full Recovery Mode

Got this question in an email. I'll paraphrase.

William

Hoping you can advise me on best way to handle this.

My client doesn’t want to use SQL backups. Instead, they are using VM and a third party backup software by choice.

Without a DBA on staff, they would prefer to deal with SQL the less the better.

They do not plan on doing any transaction logs and are not concerned with restore from point of failure.

...

Goal would be to not have transaction logs or somehow keep those logs small.

Colleague

Here's how I replied.

So, the first question to ask a client who doesn't want transaction log backups is, "why?"

The transaction log is not something to be afraid of. Backing up the transaction log is a simple process and clears out the log. Backup the log at regular intervals (somewhere between 15 and 60min is typical). Then, you get all the advantages of point-in-time recovery, so if you have a 5:01pm disaster, you could restore all the way to 5:00pm, for example. Plus, full recovery mode is required for most SQL-based HA/DR scenarios.

Using a third party backup software is fine, but they must have purchased the SQL plugin. This is usually an extra cost add-on for whatever backup license they bought. You CANNOT backup an .mdf file with a file system backup and expect that to work. SQL MUST BACKUP SQL.

That SQL plugin does exactly that for the third party software - sends commands to SQL to perform a backup. It's fine if they don't want to use sql maintenance plans or jobs to perform the backups, but they must use the sql plugin for any 3rd party enterprise backup software.

With a database in full recovery mode, you must perform full and transaction log backups, or the log will grow.

Typically, you do a nightly full and hourly tran log backups. The reason people get scared/confused/frustrated with transaction logs is that they don't realize you are supposed to back up the log (which empties it) regularly. As in, hourly at most. If they want the log to stay small, back it up every 15 minutes. BackupExec can do that.

If they are really stubborn about not wanting to use all the DR capability that they paid for, putting the database in simple mode is fine. Just full sql backups are needed then. In simple mode, transactions aren't stored in the log after they commit. A transaction commits, there's a checkpoint, and then it is wiped from the log. A simple recovery mode database will have its transaction log grow to the size of the biggest single concurrent set of transactions at any given time.

Never truncate a log file. Perform a log file backup, which will empty the file. The file may still be large though it is empty, in this case if you absolutely need the space back, you can shrink the log file. Never shrink a data or log file on a schedule. It creates fragmentation which hurts performance.

Let me know if you have any questions!

-William

Saturday, January 21, 2012

Actual Emails: Is "NOLOCK is the epitome of evil?"

Sent: Friday, January 20, 2012 10:30 AM
To: William Assaf
Subject: dba has a question

William,

Can you send me references to support my contention that NOLOCK is the epitome of evil?

That is, that it can return invalid and/or unrelated data.

He is still suggesting that folks use NOLOCK (rather than READPAST).

We have to work around other poorly written queries, so some of these are inevitable.

Thanks,

[my colleague's name redacted to protect the innocent]

»»»Click here to read more »»»