avoiding the out of memory error when using database project deployment

Avoiding the Out Of Memory Error When Using Database Project Deployment

Few of us who work with tables with billions and trillions of records have a test database that houses that same amount of data; the overhead of hardware and time refreshing data often drives our test environments to be much smaller.

However, working with a limited test environment means that some issues are not uncovered until code is applied against the sheer volume of data in the production environment. Performance problems surface in queries that worked fine with only a couple million rows. Newer coding techniques hit the limit of what they can do and database project deployment can run the server out of memory.

The Advantages of Using Database Projects

Database projects, introduced to Visual Studio in 2013, allow development of database objects to have the same control and flow as other types of code.

  • Database projects allow version control and historical tracking. These projects can be checked into source control, giving the ability to merge in other developers' code and compiling.
  • All database objects can be maintained in one place. This eliminates the wondering of where the latest version of a table or procedure is; the database project becomes a source of truth.
  • Deployment scripts can be generated. This saves time and effort in having to write scripts to deploy database objects without data loss.

What Publish Does When Reordering Fields in a Table

These generated deployment scripts save a lot of time. They compare the version that is in the project with what is in the target database and create an executable script to update the objects. While it has some error checking built in, such as checking for existence of records in a table within which it will be reordering fields, it does allow you to reorder fields in a table.

You may ask, "why do you need to reorder fields in a table?" Most developers are content to add the columns to the end of the table and call it a day. But when you are dealing with large amounts of fields due to denormalized data, having some sort of order to the fields can make it faster to find both in record sets as well as queries.

The steps the publish process takes to reorder fields in a table are very straightforward:

  1. A transaction is started.
  2. A table with the fields in the correct order is created in the database using the same schema. It is given the name tmp_ms_xx_tablename where tablename is the name of your original table.
  3. It copies all the data from the old table to the new table.
  4. It drops the old table.
  5. It renames the new table as the old table.
  6. It commits the transaction.

Here is a piece of example code where we have added Product Average Daily Balance to a product table (banking industry):

BEGIN TRANSACTION;
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SET XACT_ABORT ON;

CREATE TABLE [Product].[tmp_ms_xx_Product] (
	[ProductPK] INT IDENTITY(1, 1) NOT NULL
	, [AccountPK] INT NOT NULL
	, [ProductDefinitionPK] INT NULL
	, [ProcessDateKey] INT NOT NULL
	, [ProductAccrualBasis] VARCHAR(50) NULL
	, [ProductActivityDateKey] INT NOT NULL
	, [ProductAverageDailyBalance] NUMERIC(15, 2) NULL --this is where the new field is inserted
	, [ProductBalance] NUMERIC(15, 2) NULL
	, [ProductChargeOffAmount] NUMERIC(15, 2) NULL
	, [ProductChargeOffDateKey] INT NOT NULL
	, [ProductChargeOffType] INT NULL
	, [ProductChargeOffTypeDescription] VARCHAR(50) NOT NULL
	, [ProductClass] VARCHAR(10) NOT NULL
	, [ProductCloseDateKey] INT NOT NULL
	, [ProductDelinquentDays] INT NULL
	, [ProductInactiveDays] INT NULL
	, [ProductInterestRate] NUMERIC(15, 6) NULL
	, [ProductMaturityDateKey] INT NOT NULL
	, [ProductOpenDateKey] INT NOT NULL
	, [ProductOriginalBalance] NUMERIC(15, 2) NULL
	, [ProductStatus] VARCHAR(15) NULL
	, [ProductType] INT NULL
	, [CreatedDateTime] DATETIME CONSTRAINT [DF_Product_Product_CreatedDateTime] DEFAULT(getdate()) NOT NULL
	, [CreatedExecutionLogID] INT NOT NULL
	, CONSTRAINT [tmp_ms_xx_constraint_PK_Product_Product1] PRIMARY KEY CLUSTERED ([ProductPK] ASC)
	);

IF EXISTS (
		SELECT TOP 1 1
		FROM [Product].[Product]
		)
BEGIN
	SET IDENTITY_INSERT [Product].[tmp_ms_xx_Product] ON;

	INSERT INTO [Product].[tmp_ms_xx_Product] (
		[ProductPK]
		, [AccountPK]
		, [ProductDefinitionPK]
		, [ProcessDateKey]
		, [ProductAccrualBasis]
		, [ProductActivityDateKey]
		, [ProductBalance]
		, [ProductChargeOffAmount]
		, [ProductChargeOffDateKey]
		, [ProductChargeOffType]
		, [ProductChargeOffTypeDescription]
		, [ProductClass]
		, [ProductCloseDateKey]
		, [ProductDelinquentDays]
		, [ProductInactiveDays]
		, [ProductInterestRate]
		, [ProductMaturityDateKey]
		, [ProductOpenDateKey]
		, [ProductOriginalBalance]
		, [ProductStatus]
		, [ProductType]
		, [CreatedDateTime]
		, [CreatedExecutionLogID]
		)
	SELECT [ProductPK]
		, [AccountPK]
		, [ProductDefinitionPK]
		, [ProcessDateKey]
		, [ProductAccrualBasis]
		, [ProductActivityDateKey]
		, [ProductBalance]
		, [ProductChargeOffAmount]
		, [ProductChargeOffDateKey]
		, [ProductChargeOffType]
		, [ProductChargeOffTypeDescription]
		, [ProductClass]
		, [ProductCloseDateKey]
		, [ProductDelinquentDays]
		, [ProductInactiveDays]
		, [ProductInterestRate]
		, [ProductMaturityDateKey]
		, [ProductOpenDateKey]
		, [ProductOriginalBalance]
		, [ProductStatus]
		, [ProductType]
		, [CreatedDateTime]
		, [CreatedExecutionLogID]
	FROM [Product].[Product]
	ORDER BY [ProductPK] ASC;

	SET IDENTITY_INSERT [Product].[tmp_ms_xx_Product] OFF;
END

DROP TABLE [Product].[Product];

EXECUTE sp_rename N'[Product].[tmp_ms_xx_Product]'
	, N'Product';

EXECUTE sp_rename N'[Product].[tmp_ms_xx_constraint_PK_Product_Product1]'
	, N'PK_Product_Product'
	, N'OBJECT';

COMMIT TRANSACTION;

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
GO

 

Big Data Considerations

The above script runs fine when you have thousands of records in the Product table. But when you are dealing with billions of records and more, the server can quickly reach its limit.

Think about what it is doing to get a sense of the record counts:

  • The new table, fully populated, will exist at the same time as the old table. This doubles the amount of space needed to store the data.
  • The transaction log will hold information for every row inserted into the new table.

With this amount of excess data around, you can see how it will quickly generate an out-of-memory error.

How To Get Around the Out-Of-Memory Error

Getting around the out-of-memory error involves removing the transaction and processing the data in chunks.

Warning: Removing transactions on data operations on a server that is actively processing CRUD is dangerous and will probably lead to missed data. The following code assumes that the updates to the table are occurring on a database that has no other operations going on.

When Your Data is Date Stamped

When your table has data that are time stamped, it becomes fairly easy to move the data by date.

  1. Create the new table, just as the script would.
  2. Instead of putting data into the new table in one batch, use a cursor to segment the data by date.
  3. At the end, check to make sure the record counts between the new table and old are the same before dropping the old table and renaming the new.
DROP TABLE IF EXISTS [Product].[tmp_ms_xx_Product]; --it is good practice to drop a table if you are going to recreate it
GO;
CREATE TABLE [Product].[tmp_ms_xx_Product] (
	[ProductPK] INT IDENTITY(1, 1) NOT NULL
	, [AccountPK] INT NOT NULL
	, [ProductDefinitionPK] INT NULL
	, [ProcessDateKey] INT NOT NULL
	, [ProductAccrualBasis] VARCHAR(50) NULL
	, [ProductActivityDateKey] INT NOT NULL
	, [ProductAverageDailyBalance] NUMERIC(15, 2) NULL
	, [ProductBalance] NUMERIC(15, 2) NULL
	, [ProductChargeOffAmount] NUMERIC(15, 2) NULL
	, [ProductChargeOffDateKey] INT NOT NULL
	, [ProductChargeOffType] INT NULL
	, [ProductChargeOffTypeDescription] VARCHAR(50) NOT NULL
	, [ProductClass] VARCHAR(10) NOT NULL
	, [ProductCloseDateKey] INT NOT NULL
	, [ProductDelinquentDays] INT NULL
	, [ProductInactiveDays] INT NULL
	, [ProductInterestRate] NUMERIC(15, 6) NULL
	, [ProductMaturityDateKey] INT NOT NULL
	, [ProductOpenDateKey] INT NOT NULL
	, [ProductOriginalBalance] NUMERIC(15, 2) NULL
	, [ProductStatus] VARCHAR(15) NULL
	, [ProductType] INT NULL
	, [CreatedDateTime] DATETIME CONSTRAINT [DF_Product_Product_CreatedDateTime] DEFAULT(getdate()) NOT NULL
	, [CreatedExecutionLogID] INT NOT NULL
	, CONSTRAINT [tmp_ms_xx_constraint_PK_Product_Product1] PRIMARY KEY CLUSTERED ([ProductPK] ASC)
	);

SET NOCOUNT ON
DECLARE @ProcessDateKey INT
DECLARE curDate CURSOR
FOR
SELECT DISTINCT [ProcessDateKey]
FROM [Product].[Product]
ORDER BY [ProcessDateKey]

OPEN curDate

FETCH NEXT
FROM curDate
INTO @ProcessDateKey

WHILE @@FETCH_STATUS = 0

BEGIN
	SET IDENTITY_INSERT [Product].[tmp_ms_xx_Product] ON;

	INSERT INTO [Product].[tmp_ms_xx_Product] (
		[ProductPK]
		, [AccountPK]
		, [ProductDefinitionPK]
		, [ProcessDateKey]
		, [ProductAccrualBasis]
		, [ProductActivityDateKey]
		, [ProductBalance]
		, [ProductChargeOffAmount]
		, [ProductChargeOffDateKey]
		, [ProductChargeOffType]
		, [ProductChargeOffTypeDescription]
		, [ProductClass]
		, [ProductCloseDateKey]
		, [ProductDelinquentDays]
		, [ProductInactiveDays]
		, [ProductInterestRate]
		, [ProductMaturityDateKey]
		, [ProductOpenDateKey]
		, [ProductOriginalBalance]
		, [ProductStatus]
		, [ProductType]
		, [CreatedDateTime]
		, [CreatedExecutionLogID]
		)
	SELECT [ProductPK]
		, [AccountPK]
		, [ProductDefinitionPK]
		, [ProcessDateKey]
		, [ProductAccrualBasis]
		, [ProductActivityDateKey]
		, [ProductBalance]
		, [ProductChargeOffAmount]
		, [ProductChargeOffDateKey]
		, [ProductChargeOffType]
		, [ProductChargeOffTypeDescription]
		, [ProductClass]
		, [ProductCloseDateKey]
		, [ProductDelinquentDays]
		, [ProductInactiveDays]
		, [ProductInterestRate]
		, [ProductMaturityDateKey]
		, [ProductOpenDateKey]
		, [ProductOriginalBalance]
		, [ProductStatus]
		, [ProductType]
		, [CreatedDateTime]
		, [CreatedExecutionLogID]
	FROM [Product].[Product]
	WHERE [ProcessDateKey] = @ProcessDateKey
	ORDER BY [ProductPK] ASC;

	SET IDENTITY_INSERT [Product].[tmp_ms_xx_Product] OFF;

	FETCH NEXT
	FROM curDate
	INTO @ProcessDateKey
END

CLOSE curDate
DEALLOCATE curDate
SET NOCOUNT OFF

DECLARE @NEW INT = (
		SELECT COUNT(*)
		FROM [Product].[tmp_ms_xx_Product]
		)
DECLARE @OLD INT = (
		SELECT COUNT(*)
		FROM [Product].[Product]
		)

IF @NEW != @OLD
BEGIN
	PRINT 'ERROR ON [Product].[Product]'

	RETURN
END
ELSE
BEGIN
	PRINT 'SUCCESS ON [Product].[Product]'

	DROP TABLE [Product].[Product];

	EXECUTE sp_rename N'[Product].[tmp_ms_xx_Product]'
		, N'Product';

	EXECUTE sp_rename N'[Product].[tmp_ms_xx_constraint_PK_Product_Product1]'
		, N'PK_Product_Product'
		, N'OBJECT';
END
GO

 

When Your Data is not Date Stamped

If your data is not such that it can cleanly be delineated by a date, you can use a helper table to do the chunking.

DROP TABLE IF EXISTS [Product].[tmp_ms_xx_Product]; --it is good practice to drop a table if you are going to recreate it
GO;
CREATE TABLE [Product].[tmp_ms_xx_Product] (
	[ProductPK] INT IDENTITY(1, 1) NOT NULL
	, [AccountPK] INT NOT NULL
	, [ProductDefinitionPK] INT NULL
	, [ProcessDateKey] INT NOT NULL
	, [ProductAccrualBasis] VARCHAR(50) NULL
	, [ProductActivityDateKey] INT NOT NULL
	, [ProductAverageDailyBalance] NUMERIC(15, 2) NULL
	, [ProductBalance] NUMERIC(15, 2) NULL
	, [ProductChargeOffAmount] NUMERIC(15, 2) NULL
	, [ProductChargeOffDateKey] INT NOT NULL
	, [ProductChargeOffType] INT NULL
	, [ProductChargeOffTypeDescription] VARCHAR(50) NOT NULL
	, [ProductClass] VARCHAR(10) NOT NULL
	, [ProductCloseDateKey] INT NOT NULL
	, [ProductDelinquentDays] INT NULL
	, [ProductInactiveDays] INT NULL
	, [ProductInterestRate] NUMERIC(15, 6) NULL
	, [ProductMaturityDateKey] INT NOT NULL
	, [ProductOpenDateKey] INT NOT NULL
	, [ProductOriginalBalance] NUMERIC(15, 2) NULL
	, [ProductStatus] VARCHAR(15) NULL
	, [ProductType] INT NULL
	, [CreatedDateTime] DATETIME CONSTRAINT [DF_Product_Product_CreatedDateTime] DEFAULT(getdate()) NOT NULL
	, [CreatedExecutionLogID] INT NOT NULL
	, CONSTRAINT [tmp_ms_xx_constraint_PK_Product_Product1] PRIMARY KEY CLUSTERED ([ProductPK] ASC)
	);

SET NOCOUNT ON

DECLARE @BatchSize INT = 1000000 //Set this appropriately for your server capacity
DECLARE @RecordCount INT = (
		SELECT COUNT(*)
		FROM [Product].[Product]
		)
DECLARE @GroupCount INT = (@RecordCount / @BatchSize) + 1

DROP TABLE IF EXISTS #PK
SELECT ROW_NUMBER() OVER (
		ORDER BY [ProductPK] ASC
		) RowNum
	, [ProductPK] AS PK
INTO #PK
FROM [Product].[Product]

CREATE INDEX IX_TMP_PK ON #PK ([PK])

DECLARE @Counter INT = 1

WHILE @Counter <= @GroupCount
BEGIN
	SET IDENTITY_INSERT [Product].[tmp_ms_xx_Product] ON;

	INSERT INTO [Product].[tmp_ms_xx_Product] (
		[ProductPK]
		, [AccountPK]
		, [ProductDefinitionPK]
		, [ProcessDateKey]
		, [ProductAccrualBasis]
		, [ProductActivityDateKey]
		, [ProductBalance]
		, [ProductChargeOffAmount]
		, [ProductChargeOffDateKey]
		, [ProductChargeOffType]
		, [ProductChargeOffTypeDescription]
		, [ProductClass]
		, [ProductCloseDateKey]
		, [ProductDelinquentDays]
		, [ProductInactiveDays]
		, [ProductInterestRate]
		, [ProductMaturityDateKey]
		, [ProductOpenDateKey]
		, [ProductOriginalBalance]
		, [ProductStatus]
		, [ProductType]
		, [CreatedDateTime]
		, [CreatedExecutionLogID]
		)
	SELECT [ProductPK]
		, [AccountPK]
		, [ProductDefinitionPK]
		, [ProcessDateKey]
		, [ProductAccrualBasis]
		, [ProductActivityDateKey]
		, [ProductBalance]
		, [ProductChargeOffAmount]
		, [ProductChargeOffDateKey]
		, [ProductChargeOffType]
		, [ProductChargeOffTypeDescription]
		, [ProductClass]
		, [ProductCloseDateKey]
		, [ProductDelinquentDays]
		, [ProductInactiveDays]
		, [ProductInterestRate]
		, [ProductMaturityDateKey]
		, [ProductOpenDateKey]
		, [ProductOriginalBalance]
		, [ProductStatus]
		, [ProductType]
		, [CreatedDateTime]
		, [CreatedExecutionLogID]
	FROM [Product].[Product] p
	INNER JOIN #PK t ON p.[ProductPK] = t.[PK]
	WHERE FLOOR(t.RowNum / @BatchSize) = @Counter - 1
	ORDER BY [ProductPK] ASC;

	SET IDENTITY_INSERT [Product].[tmp_ms_xx_Product] OFF;

	SELECT @Counter += 1
END

SET NOCOUNT OFF

DECLARE @NEW INT = (
		SELECT COUNT(*)
		FROM [Product].[tmp_ms_xx_Product]
		)
DECLARE @OLD INT = (
		SELECT COUNT(*)
		FROM [Product].[Product]
		)

IF @NEW != @OLD
BEGIN
	PRINT 'ERROR ON [Product].[Product]'

	RETURN
END
ELSE
BEGIN
	PRINT 'SUCCESS ON [Product].[Product]'

	DROP TABLE [Product].[Product];

	EXECUTE sp_rename N'[Product].[tmp_ms_xx_Product]'
		, N'Product';

	EXECUTE sp_rename N'[Product].[tmp_ms_xx_constraint_PK_Product_Product1]'
		, N'PK_Product_Product'
		, N'OBJECT';
END
GO

 

Conclusion

By modifying the database project publish script you can process very large tables in chunks and avoid the out-of-memory error caused by the standard transactions included in the default script.

We can help you make the most of your data.

Marathon offers a complete portfolio of business intelligence and data analytics services, providing valuable insights and opportunities to build a roadmap to success. Let's talk about your project today.

Laura Moss
Laura MossCore Contributor

Laura Moss is a senior software engineer with Marathon Consulting. As a data wrangler, she specializes in data warehouse architecture and moving data between systems. Her inbox is always empty.

Let's Talk About Your Project.

We are a full-service IT and digital marketing firm. We believe that successful projects are the result of working collaboratively and transparently with our clients. Are you looking for a better user experience for your website or application? Need an experienced database architect or business analyst? Let’s talk!

X