The table has about 3 million rows, and there's about 25000 rows different. I've run the query against my system which compares two tables with 21 fields of regular types in two different databases attached to the same server running SQL Server 2005. It's worked well enough on tables that are about 1,000,000 rows, but I'm not sure how well that would work on extremely large tables. Here's what I've done before: (SELECT 'TableA', * FROM TableA WHERE IsNull(ST.chksum,0) IsNull(TT.chksum,0) LEFT JOIN #ChkSumSourceTables ST ON TT.Name = ST.Name execute dynamic statements - populate temp tables with checksumsĮXEC the two databases to find any checksums that are different + 'UPDATE #ChkSumSourceTables SET = (SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM ' + 'UPDATE #ChkSumTargetTables SET = (SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM ' T.name like build a dynamic sql statement to populate temp tables with the checksums of each table T.name like create a temp table that lists all tables in source databaseĬREATE TABLE #ChkSumSourceTables ( varchar(250), varchar(50), chksum int) INNER JOIN S ON T.schema_id = S.schema_id create a temp table that lists all tables in target databaseĬREATE TABLE #ChkSumTargetTables ( varchar(250), varchar(50), chksum int) parameter = if no table name was passed do them all, otherwise just check the one Thanks to answers below for pointing me in the right direction. It worked so well we're doing it on every table in each database. here is the exact approach I ended up taking. But, I'd like to explore the hash idea a little further if possible.įor any future vistors. One approach that intrigues me is this creative use of the union statement. We have Red-Gate data compare but since the tables in question contain millions of rows each I'd like something a little more performant. I'm talking both schema and data.Ĭan I do a hash on the table it's self like I would be able to on an individual file or filegroup - to compare one to the other. What is the quickest way to verify that those tables (on two different servers) are in fact identical. When all is said and done there are a bunch of tables that should be identical. More columns also required adding to the GROUP BY portion of the query.We're doing an ETL process. I ran across this while trying to perform a similar task with a query containing about a dozen columns. I do believe my approach is a bit easier to follow. That is still significantly slower then the other two queries. Adding a key to the user_id on the posts and pages tables avoids the file sort and sped up the slow query to only take 18 seconds. Using EXPLAIN with each of the queries shows that both of your approaches involves a filesort which is avoided with my query. Your updated simpler method took over 2000 times as long (nearly 3 minutes compared to. Limited testing showed nearly identical performance with this query to your query using left join to select subqueries. To test performance differences, I loaded the tables with 16,000 posts and nearly 25,000 pages. (select count(*) from pages where er_id=er_id) as page_count (select count(*) from posts where er_id=er_id) as post_count, My solution involves the use of dependent subqueries. INSERT INTO users (name) VALUES ( 'Jen ') ĬREATE TABLE posts (post_id INT PRIMARY KEY AUTO_INCREMENT, user_id INT) ĬREATE TABLE pages (page_id INT PRIMARY KEY AUTO_INCREMENT, user_id INT) INSERT INTO users (name) VALUES ( 'Simon ') INSERT INTO users (name) VALUES ( 'Matt ') CREATE TABLE users (user_id INT PRIMARY KEY AUTO_INCREMENT, name VARCHAR( 20))
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |