By now, you’ve probably heard about an upcoming SVForum’s Launch: Silicon Valley 2012 on June 5. Will you be there? You can meet our CEO Dmitry Balin with our social commerce platform – Pikaba (Pikaba.com), among specially selected startups launching their products in front of VCs, angels, executives, press and television. (See the complete list of selected demonstrating companies for Launch: Silicon Valley.)
The event, to be held at Microsoft’s Mountain View Campus, is designed to foster innovation, entrepreneurship and leadership within the Silicon Valley ecosystem of individuals and businesses participating in emerging technologies.
Providing the critical networking, insights and services needed for startup success.
SVForum has a great roster of panels & Guest Speakers, inluding Steve Blank, Vinod Khosla, NVCA Chair Ray Rothrock, Futurist Paul Saffo, Super Angel Jeff Clavier, NEA Partner Pete Sonsini and over 20 more panelist and guest speakers.
SVForum also partners include global leaders Accenture, Citrix, Deloitte, Genentech, HP, IBM, Microsoft, Nokia and SAP as well as leading venture capital, law and accounting firms, and global trade organizations.
The hype and disinformation that grudgingly prevails in the data warehouse world today brings me to raise the debate to a rational level. Let’s set aside 3NF and STAR schemas for a moment and the many flavors of analytics along with all their technologies. Let’s temporarily ignore e-commerce, database migrations, business intelligence, and data collection and processing systems. Instead let’s look at three different data storage methodologies. These are:
NoSQL - very new, lots of hype, and which really means ‘NOT ONLY SQL’
ROW - your traditional record database, well known and loved
COLUMN - still relatively new, widely misunderstood, yet still feels like normal SQL
To look at these three together I think we must first look at them separately. So here goes…
The ROW based database storage methodology is one most of us are already familiar with. Depending upon your vendor of choice (like Oracle, Microsoft, MySQL, DB2, etc…) DDL and DML syntax creates tables that stores and retrieves records. Largely based upon some form of key, be it natural or surrogate (let’s debate the many issues of schema design another time). The relational data model thrives upon the ROW based database and is widely used for many OLTP and OLAP system and/or applications. Highly efficient in complex schema designs and SQL queries, ROW based database engines offer a tried and true way to build solid solutions. We should not throw this away, I won’t!
The COLUMN based database storage methodology has been around for a while as an alternative to ROW based databases from various new vendors (like InfoBright, Vertica, Sybase IQ, etc…). Generally the DDL and DML syntax is similar to ROW based databases, yet under the hood things are usually radically different, and much more efficient for processing aggregations. This is the main thing that sets it apart from ROW based engines. Some of these column based technologies also provide high data storage compression which allows for a much smaller disk footprint. In some cases as much as 10/1 over their row based counterpart. We should adopt this where appropriate, I am!
The NoSQL based storage methodology (notice I don’t call it a database) is the new kid on the block which many vendors vying for your attention (like Hadoop, Cassandra, MongoDB, etc…). Many people view the NoSQL technology as the replacement to ROW or COLUMN based databases, but let me say right off, this is the wrong way to think of NoSQL. Instead, as a highly optimized, highly scalable, high performance Distributed File System. Yet the NoSQL storage capabilities offer striking features simply not practical with ROW or COLUMN databases.
Let’s however be very clear about what NoSQL is.
While there are three main variants (which I will cover shortly), NoSQL technologies address narrow yet important business needs. Most NoSQL vendors support structured, semi-structured, or non-structured data which can be very useful indeed. The real value, I believe, comes in the fact that NoSQL can ingest HUGE amounts of data, very fast. Forget Gigabytes, and even Terabytes, we are talking Petabytes! Gobs and gobs of data! With clustering support and multi-threaded inner-workings, scaling to the future expected explosion of data will seem a no-brainer with a NoSQL environment in play. Let’s get excited, but temper it with the understanding that NoSQL is COMPLIMENTARY and not COMPETITIVE to ROW and COLUMN based databases. And also note that NoSQL is NOT A DATABASE but a high performance distributed file system and really great at dealing with lots and lots of data; did I say BIG DATA!
I mentioned that there are three main variations of NoSQL. These include:
Key Value – which support fast transaction inserts (like an internet shopping cart); Generally stores data in memory and great for web. applications that need considerable in/out operations
Document Store - which stores highly unstructured data as named value pairs; great for web traffic analysis, detailed information, and applications that look at user behavior, actions, and logs in real time.
Column Store – which is focused upon massive amounts of unstructured data across distributed systems (think Facebook & Google); great for shallow but wide based data relationships yet fails miserably at ad-hoc queries
So to bring these three very different database storage technologies into a conjoined perspective, I think it behooves us all to consider that essentially we need all three. Regardless of what type of system being built, I’ve always subscribed to the notion that one should use the right tool for the job. You just gotta know what those are!
In summary let me say this: I believe, generally speaking, that each of these three data storage technologies offer specific features and therefore should be used in specific ways.
ROW based databases should prevail when you want a complex, but not too-huge data set that requires efficient storage and retrieval for OLTP and even some OLTP usage;
COLUMN based database are clearly aimed at analytics; optimized for aggregations coupled with huge data compression and should be adopted for most business intelligence usage;
NoSQL based data solutions step in when you need to ingest BIG DATA, fast, Fast, FAST… and when you only really need to make simple correlations across the data quickly;
Well, there you have it… most of it, in my humble opinion anyway!
Document storage databases are all the buzz these days. Interestingly enough they are actually not a very recent invention. Already 20 years ago there were object oriented databases using the very same concepts.
In this four part mini blog series I will take a bird’s eye view at the connection and relationships between object oriented databases, object stores, serialization and document storage. I will present and explain the most common document storage formats and will try to find the reason why object stores are all the buzz today but were not 20 years ago.
Episode 1 – Introduction and definitions
Document Storage: A document in context of software development is considered, plain and simple, a computer data-file or data-set. For different purposes the data file or set might contain different content. Often, when the content of the document requires a predictable structure, meta data is being introduced to define the document data’s formatting. In such case you consider the document’s data to be structured, otherwise unstructured. An example for unstructured data is a word document; an example for structured data is an XML document.
An application has different options as to where to store its document’s data. Most commonly applications employ a combination of disk storage and memory caching. Obviously the choice for storage will impact performance and scalability of your application. In more recent days there are also options to store the data in the cloud or on solid state disk (SSD).
Object Storage: Within the application all object instances live in random access memory (unless they are memory mapped to e.g. your swap file, which for now, we assume they are not). If the power goes down your random access memory will lose all its data and therefore all of the application’s object instance data (amongst other data). If you want to hold on to that data you have to persist it; that is where Object Storage comes into play – as an option to persist your application’s object instance data to.
Most software and web development frameworks (for example .Net) provide ready-to-use, boilerplate source code or complete implementations of object interfaces for your custom classes which enable you to easily persist their instance data to a document of a particular structure. Such code automatically takes care of finding your instance data and converting it to the appropriate format (e.g. for date/time data types). Some of them even take enumerations, complex data types and even object hierarchies and arrays into consideration. Common are implementations for Binary format and maybe XML. But online you can find boilerplate source code to persist your objects to pretty much any format you have ever heard of.
Some challenges for Object Storage are dealing with object versioning and object inheritance. Also handling Object references rather than instances can be tricky. Don’t assume that those “advanced features” are naturally implemented in all development frameworks. Always make sure to verify before using; and benchmark after coding. Default implementations are often not the best performing ones.
Serialization: According to Wikipedia “serialization is the process of converting a data structure or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and “resurrected” later in the same or another computer environment.”
Most commonly serialization is referred to as what you have to do when you want to persist an object instance in your application onto a storage medium other than memory. The basis of the motivation for serialization is that in opposite to random access memory, all other storage types (like disk, cloud or SSD) do NOT provide a fast mechanism to access any byte, anywhere at any time. Performance suffers heavily when trying to read e.g. from a harddrive using random access. In order to make up for some of the natural performance inferiority of e.g. harddrive storage, it is better to “stream” the data to and from the storage device in a sequential access fashion; therefore, at the end of a serialization you usually receive a structured document, which is suitable for “streaming” it to its storage destination, which is document storage.
It is obvious that serialization comes at a price. Depending on which document storage format you choose you observe different impact on CPU processing utilization and transmission bandwidth. Serialization tends to “bloat” the amount of data you have to transmit and store.
Sometimes memory mapped disk storage can be an alternative to serialization but, I believe, this is not widely used. Maybe as SSDs become cheaper and faster there may be a day when memory mapping becomes a viable alternative to serialization (I have pitched this idea to FusionIO but they didn’t seem to be impressed).
Common Document Storage Formats: As mentioned before there are many different document formats for structured document storage. Most famously probably XML and Binary. With MongoDB becoming a “household document store” JSON and BSON are also becoming more widely known (yes, I know, the Java programmers out there will disagree with me that it took MongoDB to make JSON famous); and a more exotic one is Protocol Buffers, a very compressed, binary, structured document storage format introduced and used by Google.
(JSON sample document)
Preview of next week’s Episode 2: In my next blog post I will look closer at the XML and Binary document formats. Also I will start to look at different aspects of storage formats in general which have an impact on performance and scalability and provide reasons as to why one format might be more suitable for a certain use case than another.
Resources and references: If you are interested in more information please visit the following links:
That’s my first blog post ever . I’ve been thinking about importance of blogging and social media long time ago and actually made a decision to start blogging very soon. It took me only two years to get to my first post. Well, it is better late than never .
It is very challenging to juggle my super busy life where I balance the job of running the business, raising the kids and living life… Now I’ll have to blog regularly on top of it… Why should I? For a very simple reason. I came to the conclusion that our lives become digitized to a very high degree. People who are non technical at all get more and more attached to their gadgets and perceive the world through their screens. It is the world where Google rules and Facebook is super dominant. In order to be noticed we need to show up in this digital world and tell the world that we exist. I’ve got a goal of growing my business 10x over the next few years and I want everyone to know that!!!
In this blog you will see my random thoughts about challenges of running the business, balancing it with family life, and doing my part as a humanitarian. I love my job, my family and I learned to love the crazy busy schedule. Stay tuned and I’ll share my insights on all of it…
Still deciding between (Android/iOS) apps and mobile website? There are definitely markets for both. Another important question is how to overcome the most common pitfalls in development, including priorities, usability (think checkout), and SEO.
For this article, we hand-picked the most useful resources to narrow your search and to make sure your mobile development will be on the right track.
Research and Analysis
1. Mobile commerce is expected to reach $31 billion by 2016. While this represents a compounded annual growth rate of 39% from 2011 to 2016, mobile commerce is only expected to be 7% of overall eCommerce sales by 2016. MOBILE COMMERCE FORECAST: 2011 TO 2016, Forrester.
2. eBay Forecasts $8B In Mobile Commerce Volume In 2012; PayPal Will Reach $7B. TechCrunch.
3. The number of U.S. smartphone subscribers surpassed the 100-million mark in January, up 13 percent since October to 101.3 million subscribers. Google Android ranked as the top smartphone platform with 48.6 percent market share (up 2.3 percentage points) followed by Apple with 29.5 percent market share (up 1.4 percentage points). RIM ranked third with 15.2 percent share, followed by Microsoft (4.4 percent) and Symbian (1.5 percent). Source: comScore MobiLens.
5. Digital content purchases, such as music, eBooks, TV episodes and movies, were the most popular mobile purchases in September with 47 percent of smartphone purchasers buying these items. 37 percent purchased clothing or accessories directly from retailer, while 35 percent of purchasers bought event tickets. Slightly more than one in three mobile purchasers bought daily deals and gift certificates on their device during the month. Digital Goods, Clothing/Accessories and Tickets among the Most Popular Mobile Purchases, ComScore.
10. While customers want optimised mobile sites and apps, they also want to keep the key functionality they are used to from the main site. Registration is bad on desktop, but worse on mobile. Customers don’t like to register before checkout, and retailers that have dropped registration have seen the benefits. Checkouts need to be designed for mobile. Mobile website checkouts: best and worst practice, Econsultancy.
11. Retailers need to move away from chasing the latest trend to making changes to their online channels that will help customers through the final stages of the booking funnel. In the mobile space, retailers need to be open to learning and optimising based on customer feedback. Making the most of mobile commerce opportunities in 2012, Econsultancy.
12. Having a mobile version of a website is not enough. Consumers are often left disappointed with the experience of such sites. What retailers need to think about is what mobile shoppers require and try to deliver it in a simple package. Large buttons, which can be easily clicked with a finger are recommended over small icons and an extremely simplified menu should be aimed for every time. Consumers Frustrated with Mobile Commerce, Huffington Post.
13. Beyond the low penetration, it’s also important to note that within that small percentage, it’s even smaller when it comes to people that are paying for apps, downloading apps and actually using apps. The Mobile Imperative.
Traditional mobile phones: Phones with browsers that cannot render normal desktop webpages. This includes browsers for cHTML (iMode), WML, WAP, and the like.
Smartphones: Phones with browsers that render normal desktop pages, at least to some extent. This category includes a diversity of devices, such Windows Phone 7, Blackberry devices, iPhones, and Android phones, and also tablets and eBook readers.We can further break down this category by support for HTML5:
16. The best Mobile SEO strategy is to not have a mobile SEO strategy. If you really MUST have a different site, use device detection and canonical tags. If you can’t go with using the same domain then the next best choice is m.yourdomain.com It does not really provide any SEO benefit, but “m” has sort of become the industry standard. Mobile SEO is a Myth, Search Engine Journal.
17. When submitting an app to the various app stores, whether it’s Google’s Play Market, Apple’s App Store, or the Windows Phone Marketplace, the text submitted with the app is crucial to the app’s success in the app store itself, as well as in search engines. SEO for Mobile Apps and App Stores, Practical Ecommerce.
18. More and more brands are using their Facebook profile as a mobile PPC landing page. Mobile is not the future of Facebook – it’s now. Like water seeking its level, mobile consumers are already engaging brands that make their presence easy to find, accessible, and easy to engage. 10 Optimization Secrets To Drive More Mobile Traffic From Facebook, Search Engine Land.
19. Deliver in 3 seconds. Over 50% of customers use the search feature on mobile sites. Make sure your search bar stands out, on every page of your mobile site, and index it thoroughly with your catalogue. Ten essential tips to increase mobile sales.