One of our core value propositions at DB Best is to help our clients lower the costs of running their enterprise level applications and be more productive. After 10 years in the database migration business, we now have top experts, smart tools and established methodology which allow us to seamlessly switch apps over to Microsoft products. Let us give you an example of a very successful 3-week Siemens Teamcenter Oracle to Microsoft SQL Server migration project.
In this video, our CEO, Dmitry Balin, tells about database migration for a manufacturing company in Sweden.
Take the Next Step
If you’re looking for assistance with your apps and databases, we at DB Best will work our magic to understand your requirements and deliver the most cost-effective, scalable solution (and fast, too). Contact us today.
Many of our customers ask: “If my application runs and I’m happy, why do I need to migrate?” Being in the database migration business for over 10 years now, we’ve found that a value proposition based on hardware savings, licensing advantages, and professional relationships – used together or separately – makes customers interested in moving to the SQL Server platform.
As you probably know, our key migration offerings include conversion of Oracle, Sybase ASE, Sybase ASA, DB2, DB2 UDB, MySQL, Informix, Access applications and databases to SQL Server 2005/2008. Watch this video to learn more about our migration package:
So now we at DB Best have a packaged migration offering that helps us look at this whole end-to-end migration as one project. As a part of the service, we offer Portfolio Assessment, a 12-step migration methodology that allows us to estimate the cost and effort required for application migration. It also enables us to scan the entire environment to do a quantitative analysis of the database. Based on the findings, we deliver a detailed document – internally, we call it Technical Roadmap – that describes all the steps involved in the migration of that database to SQL Server.
Want to save on data management?
Want a better understanding of what’s involved in database migration? Contact us to get started.
My last blog (Column Oriented Database Technologies) discussed the differences between Row and Column oriented databases and some key players in this space. Concepts and technologies on Big Data have been discussed in previous blogs (Big Data & NoSQL Technologies & NoSQL .vs. Row .vs. Column). From these blogs one should surmise that deciding upon the best database technology (or DBMS vendor) really depends on schema complexities, how you intend to retrieve your data, and how to get it there in the first place. We’re going to dive into this next, but before we do it is imperative that we briefly examine the differences between OLTP and OLAP database designs. Then let’s leave OLTP details for a future blog as I expect most readers already know plenty about transactional database systems. Instead we’ll focus here on OLAP details and how we process Big Data using ETL technologies for data warehouse applications.
Generally the differences between OLTP and OLAP database applications center upon how frequently data must be stored and retrieved, the integrity of that data, and how much of there is and its growth. OLTP database schemas are optimized for processing transactions by an increasing number of users while OLAP database schemas are optimized for aggregations against an increasing amount of data and exponential query permutations. Design considerations involve normalization, indexing, datatypes, user load, storage requirements, performance, and scalability. We will need to defer the many interesting details on these considerations for a future blog.
On-Line Transactional Processing (OLTP) applications, like Customer Relationship Management (CRM), (ERP), Corporate Financials (AR/AP/GL), e-Commerce, or other enterprise systems use traditional SQL queries that are either embedded within application code (not a great practice in my humble opinion), or within stored-procedures (a much better practice) in order to store and retrieve data. Where data integrity and performance is critical, an OLTP database is the most appropriate. OLTP schema designs are generally highly normalized, comprehensive structures that are optimized for fast transactional data processing. Typically they represent the source data that feed a data warehouse.
On-Line Analytical Processing (OLAP) applications however focus on Data Warehouse and Business Intelligence systems where the volume of data can grow quite large. Performance and scalability become the driving factors and data integrity is generally inherited from the source system reducing its importance. Typically OLAP schema designs follow well known modeling practices like Ralph Kimball’s Star Schema or Dan Linstedt’s Data Vault (read my blog Data Vault – What is it?). Perhaps the best way to think of an OLAP system is to think of the wizard behind the curtain who as all answers, for everyone, all the time. Boy if were only true! Instead, OLAP schema designs are generally refactored data structures based upon source OLTP systems that are optimized for fast query processing.
And so we arrive at the crux of this blog. How do we get data from the source OLTP system into the target OLAP data warehouse in a practical, efficient way? And what do we need to do to that data to conform and resolve the clearly different schema designs? So along comes ETL, or Extract-Transform-Load; fundamentally understandable, but potentially very hard to do once you dive into all the complexities involved; like peeling an onion or melting the wicked witch! Let’s examine what is really involved.
Extracting data from a source dataset, transforming that data in potentially many particular ways, and loading the resulting data into a target dataset is the essence of ETL. Understanding that is simple. Yet consider that the source data may originate from several different files, tables, views, or databases; furthermore these are potentially varied in structure, location, and host systems (ie: Oracle, MS SQL Server, MySQL, etc.). Also consider that transformations can include a myriad of different requirements from normalization to de-normalization of source data, lookups from other datasets, merging, sorting, truncating, datatype conversion, inner joins, outer joins, matched and/or unmatched records, etc. These requirements are anywhere from simple to daunting. Consider the data load; perhaps there is one target, perhaps many; data might reside in a data warehouse, and/or data marts; maybe it’s a Star Schema, maybe not! Finally the entire ETL data process may be a full data set, or an incremental one. OMG! ETL data processing permutations are endless and one thing is certain, getting it right is critical!
What should we do then to deal with these comprehensive and complex data processes?
We use tools of course!
Some may build their ETL data processing with SQL scripts, or maybe embedded SQL in another scripting language like PHP or Python. Others may actually craft programs using high level languages like C#, Java, or Visual Basic. These solutions are fine, of course, but they present a cumulative burden of development, maintenance, and cost that can easily exceed the alternative of using tools designed specifically for ETL data processing. Not that there is no burden if you use ETL tools, but a greatly reduced burden, in my humble opinion. There are several ETL tool vendors out there all providing the various functionalities needed. I found this survey result, which is about as I would expect.
Keep in mind however that there are several other key features that should be considered when looking at ETL tools, including:
Task Scheduler & Triggers
Environment Control (ie: DB Connections, Global Variables)
Project, User, Task Management
Integrated Source Control
Task Execution Analysis
Monitoring & Logging
Real-Time execution logging
File and Database error capture
Using ETL tools with the appropriate features and usability is very important, yet there is one more aspect I submit must be considered before crafting any ETL process. Data Warehouse Schema design! All the objects involved within a Data Warehouse are really driven by the overarching architecture. This aspect is so easily overlooked, or under-valued. Many just slap something together and hope for the best. Maybe this contributes to the almost 80% failure rates of first DW/BI project efforts. Let’s take a step back then and look at this from 35,000 feet for a moment.
Ok, we all know that an OLAP Data Warehouse commonly supports analytics and reporting, often referred as Business Intelligence. We typically understand what our data sources are and that we will use ETL data processes to move data through a system that in turn must be a sustainable, pliable, expanding DW/BI solution. Yet what is the architecture of that system? Do you notice we started from a comfortable position of factual information of what is known and quickly culminated in a place of genuine uncertainty about what will be constructed? The good news is that we also do know what the results should be.
Consider carefully then goals and requirements of the business and the metrics involved and then decide what the end data stores should be. From there you can fill in the gaps; ask yourself these many questions:
What are the business metrics and how should they be defined?
What permutations are involved? (ie: reporting periods, frequency, filters, etc.)
Should the target data stores be column, or row based?
Should Star Schemas or Data Vaults be used? and when?
What, if any, Data Marts would be employed?
Do they inter-relate or are they stand-alone?
How much source data exists already?
How big will the target data grow? and how fast?
What are the expected complexities of the transformations involved?
How about hardware and software requirements?
Optimizations? Scalability? Functionality? ~~ you get the picture …
I surmise that anyone who stops to think about the architecture involved, many more questions will easily come to mind. The more questions you can consider and answer regarding any DW/BI system architecture the better; and sooner is better than later. One principle I always try to employ in any design I craft, is that change is inevitable so DW/BI system architectures must be pliable and extensible. Achieve this in your solution and you’re at least half way there. Just remember to click your heels together three times first, close your eyes, and repeat: there’s no place like home!
One final good practice common to many DW/BI system architectures aims at staging data first, before it goes into the data warehouse. Often called an ODS, or Operational Data Store, having a pre-emptive place to prepare, or stage, data before it moves into the data warehouse can be a tremendous advantage to any DW/BI system architecture. It does introduce an additional ETL step, but usually quite worth it. A decent ODS will contain a reduced dataset from the source with few to no transformations thus eliminating data you know you don’t need. Then the next ETL step does the real work in transformation and eventually loading into the data warehouse.
ETL data processing provides an essential methodology for moving data from point A to point B. Crafting these processes can be straight-forward to highly complex. I often wonder how we all got by with SQL scripting before ETL tools came along; today when faced with a data migration and/or date warehouse population task, I break out these tools and get to work. Best advice: Follow the yellow brick road…
As this topic is quite extensive, my next blog, “Extract-Transform-Load (ETL) Technologies – Part 2”, will focus on ETL vendors so we can examine who the key players are and what they offer.
As always, don’t be afraid to comment, question, or debate… I learn new things every day!
We have a new product in our database migration lineup – MySqlMigrator, a neat tool that helps you transfer data from MySQL tables to existing SQL Server 2008/2012 tables in a few steps. It comes with a wizard-based interface that allows restarting migration from any step. The tool uses the bulk copy mechanism for loading data from MySQL tables to SQL Server.
Here’s a quick video tutorial to get you up to speed on what MySqlMigrator can do for you:
In 2007 I worked for a company which provided data warehousing solutions. Our biggest client database was about 2-3 TB which was not so much even five years ago, but because of extremely complicated ETL and complex reports we had a lot of troubles.
We spent hundreds of hours monthly on code optimization, but every 30 minutes saved on ETL and processing were compensated with monthly data growth and we had to start over and over.
Finally it comes the time when we couldn’t improve anything else in programming code and company decided to drastically upgrade the hardware. But extremely expensive servers and disk storages gives us only several percent performance improvement. We thought about migration to another platform but preliminary estimations showed that every traditional RDBMS has the same bottleneck – disk storage. Maybe, if it was happening today, we probably would turn to fundamentally new technologies.
The whole software development industry is highly innovation-oriented, but RDMBSs always are perceived as something very conservative. I still have record about ANSI-92 SQL knowledge in my CV and almost all fundamental works from 70s are up to date, but in spite of that data management software industry is very fast developing and great new products comes to market each year.
In-memory database (IMDB) is a database management system that stores data entirely in main memory. It is contrasted with database management systems which employ a disk storage mechanism.
The idea is not so fresh. Actually, the first IMDB was created in 1991 (to they are even older than ANSI-92 from my CV), but in twenty years something has changed.
Comparing with 1991 RAM become 10000 cheaper and way faster. On this evidence, in-memory databases systems (which for a long time were used primary in RTOSs and other specific embedded systems) now can serve as a database for e-commerce, social networking and of course business intelligence solutions. You can keep terabytes of data in main memory – it’s a low end of Big Data scale.
So, now IMDBs still don’t have a big market share, but last year’s industry’s major companies pay more attention to this technology. One of them, SAP AG, recently acquired Sybase, in 2010 released in-memory database called HANA, which is now in front and center of SAP’s agenda according to several industry experts. And it is the main subject of my post.
What Problems Does SAP HANA Solve?
What is SAP HANA, Anyway?
First of all, it is an in-memory database, with all corresponding benefits and disadvantages. As for benefits, obviously, working with all the data in memory is way faster than writing to and reading from file system (in certain circumstances – 100x or even 1000x times). Well, regular memory-mapped files are very fast too, but in-memory database is still a database. HANA is full ACID-compliant, supports multi-user access, high level data definition language, several programming interfaces, SQL and MDX, stored procedures and triggers, and more other traditional RDBMs tools and mechanisms.
The common problem with ACID for most IMDBs is “D”. Main memory is volatile storage and loses its content when it is out of power. To avoid this HANA has Persistence Level component. Each committed transaction generates a log entry that is written to non-volatile storage. HANA stores changed pages in save points, which are asynchronously written to persistent storage in regular intervals (by default every 5 minutes). Persistence level is also used for whole database backup and restore.
I am not going to describe here whole IMDBs architecture and common principles, so let’s focus on specific HANA features:
HANA comes shipped as a pre-configured appliance from hardware vendor and the license is bought from SAP.
HANA is positioned as single system for both OLTP and OLAP purpose and supports both columnar and row-based storage organization, so you can use benefits of each approach simultaneously. HANA even allows joining row-based tables with column-based tables. You can also alter an existing table from row-base to columnar and vice versa.
HANA supports parallel execution.
HANA is fully integrated with SAP business objects and other client tools like MS Excel, Dashboard Design Tool etc can also access HANA directly.
HANA supports ODBC, JDBC and ODBO drivers.
HANA has its own scripting language named SQLScript that is designed to enable optimizations and parallelization and is a collection of data, functional and procedural extensions.
I can continue the list, but maybe it would be better just to give you some useful links:
SAP provides 30 days trial access to a hosted HANA test and evaluation environment. Access is provided through remote desktops (by default you are offered to connect directly from web-browser but it’s much more convenient to use traditional way), so you don’t have to install anything on your desktop. You can also read a great blog about it before you start.
I am using this trial access for about a week. I’ve tried to reproduce some problems we have to solve with our client database from the beginning of my post and the results sometimes were very impressive. But of course it’s too early to make conclusions.