Monday, April 29, 2019

Four Data Integration Design Questions to Ask

We get requests to move data between custom systems regularly, even within systems. I was advising a client on something fairly simplea collection of tables out of one vendor system to feed anotherand I thought I'd make a blog post out of the array of questions that always come up.

Regardless of the strategy for data movement, whether it be:
  • SQL Server Integration Services (SSIS) locally or in Azure Integration Runtime (IR)
  • Stored procedures
  • SQL replication
  • Secondary readable Availability Groups
  • Azure Data Factory 2.0 (not 1.0, oh goodness, never 1.0)
  • Transactional movement featuring message queues or APIs
  • Any streaming solution
  • ETL or ELT
  • Any other kind of transformation I'm forgetting to mention
The following questions should be asked before designing a data movement plan.

(There are no correct answers to these questions of course, but you must be able to determine the answers from the business case.)

1. What is the latency requirement for the changes from the data source(s) to be copied to the destination?
Common answers: Instantly, no longer than 5 min, or 30 min, or nightly.

2. How many rows are expected to change in the source(s) in a given time period? 
Common answers: Anywhere from few rows per month to all/most the rows in a table every day.

3. What types of data changes are performed in the source(s)? 
Is the source data inserted, updated, and/or deleted? 

4. Do we have a reliable way to identify "the delta"? 
How do we know which rows have changed, including hard deleted rows (vs soft deleted rows)?

Let's dive more into the last question, because this is where the design meets the implementation method. There's a reason we always design tables with an IDENTITY column and some basic auditing fields.

First off, a truncate/insert strategy is not scalable. I have redesigned more of these strategies than I can remember, often because of database developer myopia. A truncate/reinsert strategy, even a bulk insert strategy, will inevitably outgrow its time boundary identified in Question 1. Don't waste your time and resources on such a strategy, you need to identify a way to find out what changed the in data source now.

But what if we don't or can't trust the application to always modify a "ChangeDate"? This is certainly the easiest way to know if the row has changed, but what if the original table wasn't designed with such a field? We should consider whether we can alter the data source(s) with useful, built-in SQL Server features like Change Tracking (CT), Change Data Capture (CDC), or a more recently-introduced feature called Temporal Tables. The latter can provide a reliable, built-in modified date and row history, transparent to applications. All of these strategies are well documented and have easy to use labs available.

Each of these solutions is very useful and recommended in its use case, and much preferred over a trigger-based system which will add complexity and overhead to transactions. A "pull" of recent changes is much preferred for most scenarios over a "push" of each change inside the transaction.

Caveats remain howeverand this came up with a recent clientthe impact on future updates/patches for databases must account for implementations of CT, CDC, or Temporal Tables. The same caveats apply to replication (useful in spots) and database triggers. Don't enable these SQL features without consulting with and advising the maintaining developers on the potential impact and need for testing.

One more crucial factor often overlooked as part of Question 4 are the intermediate transactions, especially in the case of less-than-instant data movement. If a row changes from status 1, to status 2, to status 3, can we just send over the row state with status 3? Or must we apply an insert for status 1, an update for status 2, and then another update for status 3 to the destination? This could be a major problem if the destination has an indirect dependency on evaluating the status changes; for example, to calculate the durations between statuses.

I once designed a data warehouse for tracking the performance of auditors, and we were analyzing the workflow for the bottlenecks in a 20-step process. Each of the 20 steps and its corresponding row state and time stamp were the whole point of the analysis. This demanded some sort of row-versioning in the data source. Not all change detection strategies work for this, however. Change Tracking, for example, would not suffice. Know your solutions!

You shouldn't move forward with any data movement design before answering these questions.

Are there any other common questions you'd ask for before deciding on a plan for a project like this?

Monday, April 15, 2019

How do I learn SQL Server despite limited SQL duties at work?


Got this email from a client in the southern US asking how to up their game in SQL Server, frustrated by a lack of hands-on opportunities to administer SQL via current job duties. I also felt it necessary to discuss whether or not cert exams were appropriate, leaving it up to them, and then my preferred training methods.


To: William 

Subject: SQL Certification Questions

I am trying to gain a deeper knowledge to go and sit for the certification exams. What would be your best suggestion to immerse myself into SQL and learn the skills I need to sit for each exam. I have tried to just create tasks to force myself to learn and practice SQL query skills and that works but it has some limits. I learned a lot from that, but I learned so much more with some direction and a course structure.  I have taken a couple of online SQL training courses and everything seems simple and logical. I start feeling like less confident when I look at sample questions. They seem to go into deeper detail than what I have seen in the training classes and deeper than what I see day to day. What would your suggestion be for the best method of gaining the skills needed to effectively manage my SQL environment?



From: William

Hi! Honored that you reached out, hope this email helps. I actually give a presentation on this topic, based on my experience as a writer for the last three generations of SQL certification exams for Microsoft. The exam writers are instructed to test experience by asking questions in the frame of tasks that test whether or not the exam taker has “do this job” before. The exam writers are told to test for someone with 3-5 years of experience at a minimum, by testing things you can’t learn from only reference docs (and especially on brand new features of the latest version of SQL). So that’s who I advise taking the tests: 3-5+ years of xp. 
So my opinion here may be different from others and especially from some managers, but I don’t feel it’s appropriate or productive to ask inexperienced resources to pursue exams. What’s more likely than a passing score is a person becoming disillusioned, frustrated, or disengaged from career progress, or they try to cheat (with brain dumps), or they quit and/or change career path.
 Given that, gauge for yourself whether you think an exam makes sense for you at this stage. Regardless, as for training resources:Again experience is the best teacher here, but I understand the frustration about not being exposed to much variety, as far as SQL development/administration goes. This blog post of mine has links to many resources, I’d point out specifically the “Stairways” and the MVAMicrosoft Learn. If you don’t already have a copy of my book, the Sparkhound office near your area can totally arrange that. Joining your local SQL User Group is good free training, as well as all the virtual user groups that PASS provides for free. Highly recommend joining PASS, it’s free. There’s a SQLSaturday conference in Atlanta next weekend, again more free training, later this year there are SQLSaturdays in Memphis and Baton Rouge and Pensacola, and there are SQLSaturdays that usually happen annually in Birmingham and Columbus, GA/Phenix City, AL (thought I don’t see either in the upcoming events list on sqlsaturday.com right now…)  The wife and I like to make little weekend trips out of those Saturday conferences, and before our kid went off to college, we’d bring them along too for fun.
 As for trying to get hands-on experience, I would give you the same advice I often give my team: “lab it out.” Use your local workstation or laptop, install SQL Server Developer edition, have a local instance to play with at all times. Sign up for an Azure account and use your free credits to run an Azure VM with SQL Developer edition, or if you have an MSDN account through work or otherwise, you get monthly credits for Azure spend. If Azure isn’t an option, use a home PC or server, the hardware doesn’t have to be production-quality to facilitate learning basic concepts and testing admin actions you shouldn’t try on production. I have learned and developed a lot of my toolbox “lab” scripts by just playing and breaking and fixing and dropping and recreating on my SQL sandboxes… all outside of production. So if you don’t have an admin sandbox, get one, just setting it up, breaking and fixing it will be a learning experience.  
I can’t speak personally to any of the paid training classes, other than the SQLSkills folks literally wrote big parts of SQL Server and their training is considered top notch. The PASS Summit conference in Seattle is the biggest/best SQL Server conference with two days of all-day deep dive trainings followed by three long days of sessions, if you have that kind of training money.
 Let me know if you have any questions and best of luck!

Thursday, April 04, 2019

Activate Conference 2019: Databases 101 for the Aspiring App Dev Session Info and Resources

A special hello if you're visiting this blog post during or after my workshop on Thursday afternoon, April 4 at the Louisiana Tech Park as park of the Activate Conference 2019!


Here are all the links, downloads, and more you need:

Databases 101 for the Aspiring App Dev
Workshop Student Track
1:00 PM

Presentation Downloads:


Slide deck: Download link  (.pptx files) 

Sample TSQL 101 script: Download link (.sql file)


Links to get setup with MS SQL Server:

This is the link to download SQL for FREE:
The tools to dev in the Microsoft ecosystem are all free:
Download the WideWorldImporters sample database:
ToolboxGithub (.sql files)
  • Look for toolbox.zip for an easy download, or 
  • Click on each file then download, or 
  • Click on each file, click “raw” on each file to copy/paste

Tuesday, April 02, 2019

Activate Conference 2019: Three Days of Tech Learning

Thursday April 4 I'll be giving a workshops at the super-slick-branded Activate Conference 2019 in Baton Rouge at the Louisiana Technology Park, a conference filled with people and topics much cooler than me and my "Databases 101 for the Aspiring App-Dev" 1.5 hour workshop.

Activate is an event that aims to enrich our local tech community by establishing and furthering the careers of individuals in the web and technology field.

3-Day registration for the whole conference, which includes a day of workshops, a hackathon, and a Saturday conference, is free for students. VIP Admission is also available which includes a t-shirt, swag bag and kickoff dinner.

On Saturday, Sparkhound will be hosting a table with our third annual Lego deconstruction competition, with an prize for the fastest times!

See you there, and best of luck to the organizers: Isral, Quinton, Lynsey, and all their many helpers and volunteers!