Big Data: CAT, ups, CAP theorem. And ACID and BASE transactions basics, also full of cats


Spoiler alert: a lot of cats today.

Picadilla

Picadilla

Warm-up intro while cats are approaching: to enable working with large datasets computers are connected in a distributed system as nodes that share data. Data records are replicated across nodes to keep the system up.

It is always a business owner’s decision what to do in case when one or more nodes lose connection to distributed system. Shall all the system stop or shall those nodes still operate if they are able to respond?

Examples

Youtube node storing your video copy loses connection to others. You watch your video and see statistics 100 views. 10 minutes later you watch the same video and there are 242286 views (or vice versa). Ooops (either synchronization happened or you now are watching video on different node). But would you feel happier if no video available at all? Youtube have chosen availability over views count consistency.

Despacito

Video used Despacito laukos (Latvian parody)

Another example: one of nodes in ticket sale system loses connection. You connect and see 4 free seats. You press [Book] and see ‘Dear Customers, please come later, apologies’. You get upset and press refresh for some hours until site recovers, however no more seats. They have chosen consistency over availability.

Imagine, you’d bought and came to the event – whoops – there is another guy with same tickets. However – I must note many companies have calculated it is much cheaper to apologize and give gift cards or pay penalties instead of stopping whole business.

Why they having near unlimited money can’t just do everything ideal?

Cats proudly presents CAP theorem

You have one cat and you feed it. Single processor, single INPUT/OUTPUT.

Murmor

Then the era of Big Cats come and you have three cats: Fred, Murmor and Picadilla. You feed them and write down in your notes which and when was fed and live happily ever after.

One day you got sick and your cats were hungry. Single point of failure happened.

Distributed System introduced

You ask your spouse Alex and child Max to involve. Now you are distributed system with three nodes. They do the same as you: when seeing hungry cat, feed and write down in their notebooks it was fed. (you will ask, why not on a common whiteboard? Because we are talking about CAP theorem which applies to distributed systems and I must pretend not all feeding data can fit on a whiteboard)

Some days later your start noticing that Murmor seems fat. You go to check notes and find out that each of you have been feeding Murmor several times a day as this hell boy was constantly pretending to be hungry. The same day you get notified that Cats Care Government Agency will do regular audits.

You call the family meeting and discuss the issue that your data are not consistent and cats are having never ending party and Agency is a threat.

Consistency

You decide: before any of you is feeding any cat, you call others and hang on the phone while each writes in their notes. Thus, each will always know the latest time in their notes. Cats are biting your leg, yelling, pretending to faint and sitting on your neck but you are happy – because Consistency now is solved, you all have the same data. Cats Care Government Agency calls to examine are highly welcome.

Everything is just perfect – mobile networking fine, Alex and Max always picks up the phone, pens are writing well and notes have enough blank pages.

One day Alex leaves for expedition to jungle. When you call Alex deeply regrets forgetting notes at home. As you have agreed that data consistency is must have, it means that day you cannot feed cats because you and Max will update notes but Alex will not. Just imagine the horror Cats Care Government Agency might call you and then Alex to ask latest feeding date and come to save cats by taking them away from these shameless liars. You (heh – cats) have faced the Availability issue. They are not fed at all now.

Availability

When Alex returns to her notes you call the family meeting and decide if any of you cannot take notes others still feed cats, update their notes and leave red post-its for others. When others return home they copy all the post-its to notes. Voilả, now you have Availability.

You accept the risk if Cats Care Government Agency calls, the latest feeding data might be not the latest one but any of you still can share any history statistics – which food did you use, how often feeding was etc, based on your notes)

So CAP theorem postulates: when some of you has left home notes (partition occurs in your distributed system – or in CAP theorem terminology partition tolerance happened):

  • Either you all guarantee to have the latest feeding date in your notes (and do not care cats are hungry waiting) – Consistency
  • Or you feed cats according to your notes (and do not care if beasts are overfed or Agency might get old data) – Availability

Isn’t it obvious that we can’t have both Consistency and Availability at the same time?

Let’s exploit cats for two very famous concepts explained.

ACID transactions – pessimistic approach which forces consistency. The ideal world for data critical systems like banking (massive data quality checking, a lot of built-ins for transaction control etc. My native RDBMS world)

  • Atomic: all tasks within transaction succeed or every task is rolled back. If Max does not succeed writing notes then you and Alex erase date also from your notes and return food to fridge. Cats go crazy.
  • Consistent: on the completion of transaction the database is structurally sound. Notes are up to date without any punctuation errors and all cats have eaten exactly the same food as written in notes. No half eaten chicken left.
  • Isolated: transactions are run sequentially. There is no chance you and Max are both feeding Picadilla, while Fred eats Murmor’s fish.
  • Durable: once transaction is complete, it cannot be undone, even in presence of failure. When food is eaten and suddenly light was turned off or Max stepped on Fred’s tail the food does not appear back in bowl and you cannot just decide to add a delicacy for Picadilla – because transaction is over.

If you had enough patience to read you might notice that having this level of checks you just cannot operate petabytes. Like hoping to cut a forest with surgical scalpel.

Big data world is BASE transactions – optimistic approach accepting that database state is in a state of flow (much looser then ACID but much more scalable and big data friendly)

  • Basic Availability: appears to work most of time. Either you or Max will always hang near fridge, so cats have a chance to be fed often, even if Alex is in jungle
  • Soft state: no need for different nodes to be consistent all the time. You will feed Picadilla, leave post-it for Max and don’t care when Max updates notes
  • Eventual consistency: achieved lazily later. Some day Alex returns from jungle and will write all the dates from post-its to notes, so for some time you will all actually have the same feeding dates in your notes.

Thank you all for patience! Tomorrow is the deadline to apply for the semester end practice and I am going to draft and submit cat autofeeding system offer.

Advertisements

Big Data: with respect to NoSQL Zoo


Relational databases have many advantages, basically because of completely structured way of storing data within fundamental structure – easily understood table. But! (c) Besides RDBMS existance and advantages Google built Bigtable, Amazon developed Amazon DynamoDB, NSA built Accumulo, in part using Bigtable as an inspiration. Facebook built Cassandra, Powerset built HBase, LinkedIn built Voldemort etc.

Currently >225 different databases – see http://wwww.nosql-database.org/ – “Your Ultimate Guide to the Non-Relational Universe”

It wouldn’t have happened if ultra popular and well-established relational databases had all the capabilities these brands were looking for, would it?

RDBMS are still at the peak of the wave because wide and solid, well-grounded usage over many years in combination with strong scientific basis, financial capabilities and a lot of lessons learned have led to vendors investing resources during decades

  • to improve and polish built-in locking and transactions management,
  • preventing collisions between multi-users updating the data,
  • provide highly customizable data access control solutions,
  • expand SQL capacity (I’ll remind that outside of core SQL there are very many nuances when SQL querying different vendor databases. I’ll show some samples someday later).
  • offering a lot of metadata (data about data) and utilities (a lot of them are seldom used)

RDBMS will be alive and used always, that’s definitely is not a concern. Let’s have a simple example to have some fun and also increase respect to them. We have

A = 5 (could it be you account balance or product count in store)

B = 12

Two operation sets in queue to be processed. Maybe Anna and Peter both have pressed the button [Apply]:

Operation set O1

C:=A+B

Write A:=C

Operation set O2

C:=A+B

Write B:=C

Let’s model what may happen:

Table-AB

Which answer is correct? Which is not correct? WHY?

The only answer is: first-in, first-served. They both are correct. Welcome to the world we live in :) If developers do not build any other means to prioritize these operations, they rely on RDBMS internal built-ins for data consistency by serializing, locking and other means (heh, and one of developers’ support tasks is to be able to track down and explain to end-users why the result is 29 or 22).

For some more entertainment – let’s do the same without transaction and enjoy yet another result:

Table-AB-wt

You see, someone must implement that background logic serialising, prioritising, serving network failures, concurrent transactions and other megastuff. There is enormous count of built-in RDBMS features and amongst them are as well crucial ones as overheads nice to have ones, and that reminds a Swiss knife like this:

Swiss-knife

Noone would dare to say it is lacking features. Their marketers also will tell that this knife has solution for near every situation. And actually they are right, aren’t they.

Would YOU dare to say there a too few functions in? Would YOU recommend that this is the way to go for restaurant chef or manicure or car repair?

Of course, I can imagine you saying YES – when there is no another knife at all or other option is spade or – if this is the only tool you have seen in your life.

If you were Google or Amazon, or Facebook, you actually would believe there are another ways. Because otherwise you will choke and die, drowning in your data and watching customers running away.

You then need to deal with consistency, serialising, scaling etc. Everything. Imagine if you’d have to program the chess game if you want to play it. You sit and think: well… should I start with designing horse or thinking how to stay within chessboard after a horse turn?

It is a grave decision and serious amont of work and issues to be solved when designing your own system. This is not anymore installing Oracle and writing ‘update emp set mgr_id=17’. This is a task for many person-years. This is parallel with existing systems and growing business, this is very expected and pressed my management and must be fast, correct, stable, expandable and thousands of other must-bies.

Year 2004 Google began developing their internal data storage system Bigtable searching for cheap, distributed ways to store and query data at massive scale to petabytes of data and thousands of machines using a shared-nothing architecture and having two different data structures, one for recent writes, and one for storing long-lived data, with a mechanism for moving data from one form to the other.

Near the same time, Amazon experienced growing business and direct database access was one of the major bottlenecks. They developed and released Amazon DynamoDB as the result of 15 years of learning and implementing database that can store and retrieve any amount of data, and serve any level of request traffic.

My deepest respect to all the developers all over the world!

NoSQL

There is no such thing as the only one ‘NoSQL’ database, neither one vendor, nor one server, book, any silver bullet. As I wrote, currently >225 different databases.

Examples of fasic classification by data model is:

Popularity and trends

To have overall understanding about as-is and trends we may have a look to sites where popularity of database is measured. As an example was mentioned https://db-engines.com/en/ranking_definition. You see they measure by:

  • Number of results in search engines queries
  • Frequency of searches in Google Trends.
  • Number of related questions and the number of interested users on the well-known IT-related Q&A sites Stack Overflowand DBA Stack Exchange.
  • Number of job offers, in which the system is mentioned
  • Number of profiles in LinkedIn and Upwork, in which the system is mentioned
  • Twitter tweets, in which the system is mentioned

Actual ranking

See here: https://db-engines.com/en/ranking

Overall_ranking_sep2017

Of course, Oracle leads there – is has been in market for years and still has broad range of usage as well as serving perfect fit as well as pain to move away from it.

Much of fun is also querying by types.

Key-value https://db-engines.com/en/ranking/key-value+store

Key-value-ranking-Sep2017

Document oriented https://db-engines.com/en/ranking/document+store

Document-ranking-Sep2017

and also a lot of other reports like https://db-engines.com/en/ranking_categories

Categories-ranking-Sep2017

To my very pleasure plan of ‘Data processing systems’ lectures reveals that we will have separate sessions about near each of the most popular approaches. Can’t wait!

Next blog entry hopefully will be about sharding, CAP theorem, ACID and BASE transactions. Fingers crossed to have weekend time for blogging.

Disclaimer
This blog is solely my personal reflections.
Any link I share and any piece I write is my interpretation and may be my added value by googling to understand the topic better.
This is neither a formal review nor requested feedback and not a complete study material.

Big Data: the curtain rises


This blog entry is inspired by the first lecture within Data processing systems course, a part of my journey Lielo datu analītiķa modulis – tālākizglītības iespēja IT profesionāļiem. My reflections on warm up session are here.

When we say Big Data we imagine a lot of analytics, funny and serious findings (eg 37 Big Data case studies), huge data center and distributed file system beneath the database.

I’ll note here that despite this big-data-buzzwording file databases are nor a fresh fashion trend neither invention. Flat file databases were a natural development early in computing history, long time before relational databases flourished. However, file databases were treated then more as theoretical concepts for gourmands yet nowadays they are here as must have to rein in big data.

I’ll explain why

Any of computer science students is familiar with relational databases concepts and SQL, PL/SQL (heh, the world I live like a duck to water). Relational databases (RDBMS) are widely used perform transaction update and especially being valued for their strength handling the difficult issues of consistency during update.

However – relational database overheads required for these complex update operations support handicaps them for other functions like storing and retrieving video and unstructured data. So, there has always been a market niche for operating with large amount of poorly structured data. This niche appeared to be languishing because people were fond of relational databases for decades and expanded their usage in ways they shouldn’t as people just could not imagine that big data explosive burst to come.

Once upon a time data flow was kind of predictable and controllable. General truth was you define structure and load data there and decline if data doesn’t fit.

Big Data burst wiped away this belief.

  • Old: RDBMS traditional approach: before you must define table, columns and data types and only after you may load data
  • New: NoSQL (MongoDB): data may exist before and their definition later as particular collection of data doesn’t have to be defined before data is added

Nowadays you collect the data and think later, how to describe and use them.

Progress is impossible without change

A lot of nowadays giants were started their business using relational database years ago and while growing realized they must change architecture. Examples – LinkedIN moved from RDBMS to NoSQL, Twitter moved from MySQL to HDFS HADOOP with scalding, Facebook’s photo storage system has already been completely rewritten several times as the site has grown. I’ll not say FB moved away from RBDMS because FB still use MySQL, but primarily as a key-value persistent storage.

Facebook is one of widely known big data (very big data) producers and consumers – would never ever become popular if you had to load your timeline for some hours (https://www.quora.com/What-is-Facebooks-architecture-6) as it would be if they use RDBMS only.

There are businesses without enough capacity to change and they agonize. One of document management system, widely used in government institutions (let’s not name and shame) has supplemented their historically used RDBMS with storing documents split up to tables. It means when one needs to retrieve a document this action’s implementation takes up to 70 joins of different tables to form that one output document – you might imagine how inconvenient and inefficient it is.

Like your selfie stored aa separated pieces. Instead of ‘my latest selfie’ you would query like ‘take the face from faces table where face is mine and photo date is latest, union the latest shoulders from shoulders table, join the hands, coffee mug and background, then order them to form a rectangle and display’. Despite pain to write that query I believe query would work quite fast – until thousands of users tries do the same in parallel.

Puzzle
(picture created using http://www.jigsawplanet.com)

Back to the basics

You see storage and retrieval of data can be implemented in means other than the tabular relations used in relational databases. However, while the concepts of architecture and techniques are always evolving, the basic needs remains the same:

  • Data storing (files, partitions, tablespaces etc)
  • Data definition (create and alter tables, views etc)
  • Data manipulation (select, insert, update, delete)
  • Data control (grant, revoke access)

So, IT guys have been working for decades to make that possible. There is no one suitable database type for everything. There are hundreds of different systems (by the way, do you know document-oriented database management system, search and Big data analytics platform Clusterpoint, having development center based in Latvia?). Each of them claims to be the best 🙂 and each of them have their strengths. Giants use to write their own languages and own architectures. There are custom-written systems, like Haystack, a highly scalable object store used to serve Facebook’s immense number of photos. Facebook Messages is using its own architecture. Facebook Query Language (FQL) was introduced 2007 (no longer available).

Each software manufacturer may call the methods differently and have different syntax and different approach. But, when you get to the idea, all other is a matter of mindset, techniques and reading manuals.

Let’s have some googled examples.

Data definition – creating very simple tables

Oracle (Relational database) example

CREATE TABLE departments 
(  
 department_id number(10) NOT NULL,  
 department_name varchar2(50) NOT NULL,  
 CONSTRAINT departments_pk PRIMARY KEY (department_id) 
);

IBM InfoSphere Hadoop example

CREATE HADOOP TABLE t (
 i int, s string)
 ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
 STORED AS TEXTFILE;

Hive example (Hive Query Language (HQL) statements that are similar to standard SQL statements. Apache Hive is considered the standard for interactive SQL queries over petabytes of data in Hadoop.)

CREATE TABLE products (url STRING, category STRING)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY 't'
 STORED AS TEXTFILE
 TBLPROPERTIES ("skip.header.line.count"="1");

MongoDB example

Inside the database are collections, which is a somewhat similar version of a SQL table. Since Mongo documents exist independently, documents inside the same collection are able to have different fields. This allows you to create documents inside a collection that have different fields, but can still be queried by their specific shared fields.

> db.books.insert( 
{ 
 "title" : "Kafka on the Shore", 
 "author" : "Haruki Murakami", 
 "publish_year" : 2002, 
 "genre" : ["fiction", "magical realism"] 
} )

Data Manipulation – inserting data

Oracle (RDBMS)

INSERT INTO categories (category_id, category_name) 
VALUES (150, 'Miscellaneous');

Mongo DB

db.inventory.insertMany([

{ item: “journal”, qty: 25, size: { h: 14, w: 21, uom: “cm” }, status: “A” },

{ item: “notebook”, qty: 50, size: { h: 8.5, w: 11, uom: “in” }, status: “A” },

{ item: “paper”, qty: 100, size: { h: 8.5, w: 11, uom: “in” }, status: “D” },

{ item: “planner”, qty: 75, size: { h: 22.85, w: 30, uom: “cm” }, status: “D” },

{ item: “postcard”, qty: 45, size: { h: 10, w: 15.25, uom: “cm” }, status: “A” }

]);

Hive

LOAD DATA INPATH '/tmp/maria_dev/products.tsv' 
OVERWRITE INTO TABLE products;

Data Manipulation – querying data

Oracle (Relational database) – find all departments whose names starts with T

SELECT department_id FROM departments 
WHERE UPPER(DEPARTMENT_NAME) LIKE ‘T%’);

Elasticsearch engine big data query example – to find all records that have an author whose name begins with the letter ‘t’

POST /bookdb_index/book/_search 
{ "query": 
  { "wildcard" : 
    { "authors" : "t*" } 
  }, 
   "_source": ["title", "authors"], 
   "highlight": 
   { "fields" : 
       { "authors" : {} } 
   } 
}

MongoDB example

Select all documents in the collection where the status equals “A” and either quantity is less than 30 or item starts with the character p:

db.inventory.find( { 
    status: "A", 
    $or: [ { qty: { $lt: 30 } }, { item: /^p/ } ] 
} )
The operation corresponds to the following SQL statement:
SELECT * FROM inventory 
WHERE status = "A" 
AND (qty < 30 OR item LIKE "p%")
Hive example
SELECT to_date(o.ts) logdate, o.url, o.ip, 
o.city, upper(o.state) state,
o.country, p.category, 
CAST(datediff(from_unixtime(unix_timestamp()),
 from_unixtime(unix_timestamp(u.birth_dt, 'dd-MMM-yy'))) / 365 AS INT) age, 
 u.gender_cd
FROM omniture o
INNER JOIN products p
ON o.url = p.url
LEFT OUTER JOIN users u
ON o.swid = concat('{', u.swid , '}');

It seems Hive does not allow update but only insert overwrite.

Other Big Data examples – to be honest it was quite hard to google without proper understanding what is what in Big Data world.

Hopefully I’ll find answers later during the course. Challenge, should I accept it.

Disclaimer
This blog is solely my personal reflections.
Any link I share and any piece I write is my interpretation and may be my added value by googling to understand the topic better.
This is neither a formal review nor requested feedback and not a complete study material.

Big Data has sent you a friend request. Accept or Ignore?


Do you know there is a continuing education program – Big Data analytics Lielo datu analītiķa modulis – tālākizglītības iespēja IT profesionāļiem in Latvian University Computer Science faculty? Moreover, this program was introduced in cooperation with Accenture Latvia as a subprogram from Computer Science master studies.
Moduļa izveide ir Accenture Latvia iniciatīva un nepieciešamība pēc tālākizglītības iespējām esošajiem darbiniekiem.”

I will drop in here sometimes to reflect on my journey and lessons learned.

Why am I there?

Because I want to learn. I am relational databases and SQL expert and I hope as well map my current skills as to improve in Big Data area. I believe we have no choice to turn away and pretend Big Data do not apply to us. At the moment, less than 0.5% of all data is ever analyzed and used, just imagine the potential here.

After all this is so interesting turn in existence of mankind how have we faced that greedy data burst over us and I have chosen to domesticate that big data beast instead of fearing AI will steal my job.

Am I scared of lack of capacity? Of course, I am! Do I have plenty of free time? Oh, you can trust me, full time developer and mother-of-three and servant-of-three-cats, I haven’t! This is a very complex and time consuming program, I must attend and pass exams for six courses within 2 years from statistics to predicting algorithms, from Hadoop to R and neural networks. Each lecture lasts 16:30 – 19:45 (heyahh, nearest Narvesen, are you ready to sell a lot of coffee?). This semester once a week, next semester – twice a week.

I saw this year we are about seven newbies – wannabe big data analysts from various IT companies there. Let’s see survival rate. Fingers crossed for all of us!

So, I attended
the first lecture
of possibly the easiest course of six in total – Data processing systems. Professor tempts us with benefits of completing this module and uses a lot of buzzwords – NoSQL, HADOOP, ACID, BASE, CAP theorem, map reduce etc. We will have tests, quizzes and practice to set our system on our choice. Professor also reveals that previous year students ran restaurant’s and fitness clubs in their deliverables. Also, he will randomly pair students to force their systems integration. We will have to use our fantasy – like restaurant menu to be accompanied with a personalised offer from fitness club for best burn calories eaten.

Professor in action

He jokes we will experience a common real life situation when MY system is perfect and that terrible pain when you need to integrate that MY very perfect one with this totally horrible ‘other system’.

After intro about course prerequisites etc we had a warm-up session to illustrate why there is an explosive demand of Big Data analysts. By the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet. Accumulated digital universe to around 44 zettabytes or 44 trillion gigabytes.

Why do we have data amount increasing so fast?

I bet you can’t even imagine all the different ways there are for data to be collected – some to be used right now for a known purpose and much of them are accumulated to reveal their potential later (like Facebook finds more and more ways to use their fabulous data collection) and most likely to use as a basis to be fed into artificial intelligence.

XKEYSCORE is a formerly secret computer system first used by the United States National Security Agency for searching and analyzing global Internet data, which it collects on a daily basis. The program’s purpose was publicly revealed in July 2013 by Edward Snowden. Content for 3 to 5 days and metadata for 30 to 45 days tracking someone’s Internet usage as easy as entering an email address. The amount of data per day is 20+ terabytes.

Creative businesses like Turnstyle are gathering location data from the pings and signals smartphones are giving off when Wi-Fi or Bluetooth is turned on. To track that consumer’s exact path and timing, do they linger by the shoe rack and after that decide to head to espresso bar. These data are sold – isn’t it nice to see where people are crowding or avoiding?

Browsers, applications, smartphones, smartwatches, wearable devices, vacuum cleaners and lawn movers, fridges and car navigation systems – all they are sending data to their base station. Amount of devices and users is growing very fast.

Video cameras and drones, car recorders and health checks, robots for cleaning oil pipelines, buyer’s habits in supermarkets and student’s grades, taxes and account operations, selfies taken and emails sent, phone calls and google searches, Candy Crush moves and self-driving cars, augmented and virtual realities…

Everywhere around is data and data about data. If we do not use them to make our lives better right now, we are wasting that data resource, letting their power to stay idle and downtime.

Challenges humans face

  • How to learn to use this tremendous amount of big data?
  • How to extract value from big data? Imagine a doctor’s office having stream of his patient heart rate, oxygen level and hundreds of other measurements. How to detect anomalies in data and trigger his attention?
  • How to know what value can we extract from big data we have? (remember TurnStyle and consumer path from wifi; remember Facebook anniversaries greetings…)
  • How to search within big data? (remember a sad story when a path was restored based on a small shoe fragment in one of camera recordings)
  • How to store big data?
  • How to update big data?
  • And, of course, quite painful issue of data protection. International Data Corporation IDC estimates that about 40% of data requires some level of security, from privacy protection to full-encryption “lockdown.” Unfortunately, from these 40% less than a half actually has protection.

Disclaimer
This blog is solely my personal reflections.
Any link I share and any piece I write is my interpretation and may be my added value by googling to understand the topic better.
This is neither a formal review nor requested feedback and not a complete study material.

Noliktavas strādniekiem – pienu par kaitīgu darbu


Manuprāt, analītiķim līdz darbam pie datu noliktavas ir jāizaug, jānobriest, jānogatavinās OLTP druvā. Datu noliktava – tas ir nopietni. Tie ir lēmumi, tās ir ekspektācijas (mārketinga saukļus gan jau kādu zināt), tas ir izdarīt varen sarežģītu darbu, lai cik gudru grāmatu un visādu blogu :) būtu sarakstīts ar padomiem. Tāpat noliktavas strādnieka dienišķais būt vai nebūt: labot datus noliktavā vai gulties uz ambrazūras cīņā par labākiem avotiem?

 

Programmētāji zina – lai atrastu bagu, ir jāmāk domāt kā bags domā. Tā nu es te pārfrāzējot saku, ka DN analītiķiem pirms DN jāapēd puds sāls avota sistēmās. Jāsaprot, ko, kā un kāpēc dara avotu cilvēki, kādi grābekļi, kāds viņu domu gājiens. Ko tur liegties, pašai kā bumerangs ir situsi OLTP sistēmu izstrādātāju nepadomāšana tālāk par savu pagalmu. Neizdarāmu lietu jau parasti nav. Tikai vairāk laiku un izdomu prasa.

 

Jo vairāk runās, jo lielākā iespēja, ka noliktavas strādniekus sadzirdēs. Pielikšu atrunu, ka, protams, ir uzņēmumi, kur par šo jau tagad tiek nopietni domāts. Šeit rakstītais kompilēts no dažādu analītiķu pieredzes, nozares mītiem un leģendām. Visi piemēri anonimizēti.

 

Ideālajā pasaulē DN vs OLTP problēma ir identificēta un OLTP cilvēki ar DN cilvēkiem sadodas rokās un smaidīdami dodas uz kopīgu mērķi – vēl ideālāku pasauli. Visi zina, kādus datus un kā ņem uz DN, nosacījumi abās pusēs sakrīt, allaž savlaicīgi vienojas par izmaiņām, avota datu labošanas skripti vai nu nav nepieciešami, vai nu sakopj aiz sevis astes priekš DN, kā arī ir daudz jauku testa datu pēc pirmā pieprasījuma vai pat bez prasīšanas. Ļoti ceru, ka akurāt tā, kā pie jums.

 

Ne-ideālā pasaulē datu noliktavu cilvēkiem tik gludi neiet. Kāpēc eksistē arī neideāla paralēlā pasaule? Tāpēc, ka gadās:

 

*) avotiem un DN ir dažādi izstrādātāji (ar visām izrietošām birokrātijām),

 

*) nav cilvēka, kurš ir  ar vienu kāju avotā, ar otru noliktavā (ja būšana ir formāla, jēgas maz, bet parasti beidzas ar to, ka šo cilvēku viena no pusēm sāk arvien vairāk nodarbināt, līdz paņem pie sevis),

 

*) sistēmu izstrādei dažādi budžeti (un tad, kā saprotat, pingpongs),

 

*) avotu cilvēki ir līdz ausīm savos darbos (un, ja godīgi, nealkst rakņāties pa veciem sqliem, jo VIŅIEM viss strādā),

 

*) noliktava ieviesta ķeksīša pēc. Gala lietotāji tāpat turpina savus pirmsplūdu ekseļus, bet sākotnējais izstrādātājs piesedzās ar papīriem,

 

*) DN cilvēki izlutina lietotājus ar to, ka gan jau tiks galā, ko nu taisīt problēmu, kopš pasaulē ir izgudrota update komanda.

 

Daži gadījumi, kuros DN analītiķiem sirms mats klāt:

 

1) ja avots kādā brīdī nomaina lauka nozīmi – senāk rekins.nosutits “jā” bija ‘0’, bet kopš eiro ieviešanas “jā” kļuvis “1”. Dažreiz pat kurioza iemesla dēļ – jaunajam avota analītiķim sajuka, bet tad, kad atklāja, ka jau uztaisīts, tā arī palika.

 

2) ja lauka satura nozīme atšķiras dažādas situācijās. Ja līzinga gadījumā laukā “klients.adrese” ir pircēja adrese, bet kredīta gadījumā – galvotāja adrese. Papildus kādā specifiskā gadījumā, piemēram, ja nokavēti vairāk 3 maksājumi, tajā glabājas adrese, uz kuru sūtīts pirmais atgādinājums, bet ja nokavēti vairāk nekā 3, tad tā, uz kuru sūtīts jaunākais atgādinājums. Avota sistēmā cilvēki saliek ifus, notestē, kaut kādu daļu ieraksta izmaiņu projektējumā (kurš līdz ar citiem 376 maziem dokumentiņiem kaut kur glabājas), kaut ko koda komentāros, nodod un laimīgi aizmirst, avota formās adreses smuki salec pa laukiem. Pēc kāda laika datu noliktavu cilvēki mēģina saprast loģiku.

 

Žanra klasika ir avota sistēmu projektējumos šajā vietā frāze “tiks precizēts vēlāk” vai komentārs “Inokentijs ierakstīs”. Spoiler alert: Inokentijs tālēs zilajās.

 

DN tad rodas kaut kas līdzīgs šiem:

 

A) paņem uz DN kā “adrese” un ieraksta aprakstā a la “Kredīta gadījumā galvotāja adrese, citos gadījumos tas, kas aprakstīts avota dokumentācijā”. Lauks pārņemts, sirdsapziņa tīra, lai nu lietotāji paši tiek galā.

 

B) tiem, kuriem var atšķirt, izveido DN objektus a la “Līzinga ņēmēja adrese”, “Galvotāja adrese”, “Pārējās adreses”, un tad vai nu transformācijā vai pēcapstrādē, vai pārskatos sadala ar case ifiem.

 

C) DN cilvēki mistiskā veidā panāk, ka avots sakārto datus un ievieš pie sevis dažādus laukus. Vai vismaz iedod precīzus where nosacījumus.

 

D) variācija – DN cilvēki pēc labākās sirdsapziņas uztaisa. Tikmēr avota sistēmā atnāk jauns, strādīgs analītiķis un piekopj. Vienā brīdī kādam lietotājam rodas aizdomas, kāpēc kredītņēmēji tik bieži dzīvo turpat, kur galvotāji, un tad futbola sezonu var uzskatīt par atklātu.

 

3) izmaiņu atšķiršanas mehānisms. Jāpiekrīt, ka avota sistēmai ne vienmēr vajadzīgs. Uztaisa update vai fiziski padzēš. Bet DN pēc tam raķešzinātne jātaisa. Man personīgi tagad simpatizē, ja katrā avota tabulā katrai rindai ir pieejami ierakstīšanas un pēdējās koriģēšanas datumi, savukārt dzēstie ir atzīmēti kā loģiski dzēsti vai vismaz to atslēgas kaut kur uzskaitītas kā fiziski dzēstas.

 

4) avotu sistēmu datubāzes pusdokumentētas, pusaizmirstas īpatnības. Ir jauki, ja ir smuki, godīgi lauciņi un tabulas a la “darijums” ar klasificētu pazīmi, kredīts vai līzings. Vai atsevišķas tabulas “kredits”, “lizings”. Nav tik jauki, kad ir tabula “na_kustiba” ar laukiem paz1, paz2, paz3, paz4. Lien nu ārā no ādas, lai uzzinātu, ka kredīts ir tad, ja paz1=Z un paz3=9, bet līzings ja paz2=Y, paz3=4Q un nav aizpildīts paz4. Nu, ideju, cerams, sapratāt. Šis raksturīgs vecām sistēmām. Diemžēl gadās arī jaunākām, kuru izstrādātāji sapinušies meistarībā.

 

Šeit tomēr lūdzu paturēt prātā: mēs visi esam tikai cilvēki. Zinu reālu gadījumu, kur jaunpieņemtai augstas klases PLSQL programmētājai projekta cilvēki palūdza rakstīt vienkāršāku kodu. Jo viņa pabeigs darbu un aizies uz citu projektu, bet palicējiem būs jāsadzīvo ar tiem dinamiskajiem masīviem un citām grūti lasāmām konstrukcijām. Šī atziņa mani izmācīja, piemēram, likt nenorādīta datuma lookupam aizbāzni nevis … else ‘25000101’, bet … else to_char(to_date(’01-JAN-2500′),’YYYYMMDD’), jo otrajā gadījumā manuprāt ātrāk skaidrs, ka tas ir no aizbāžņa datuma izveidots identifikators.

 

Daži no jautājumiem, kurus labā pasaulē risina arhitekti un kvalitātes pārvaldnieki, bet, ja tas tā nav, tad DN analītiķiem galvas kūp:

 

1) Avota sistēmā dati, teiksim statusi (iesniegts, apstiprināts…) pēc idiem pielasās no dažādiem klasifikatoriem atkarībā no dokumenta tipa (ar nebūtiskām (?) niansēm – piemēram, kādam “apstiprināts” varētu saukties “pieņemts”. Bet var būt papildnianses, ka citam savukārt “apstiprināts” nozīmē pēc biznesa to pašu, ko vēl citam “iesniegts”)) – vai DN salikt vienā statusu dimensijas tabulā ar papildatribūtiem, vai pārņemt kā katram savu?

 

2) Ko darīt, ja avotā ir kļūdains risinājums, kurš netraucē avota sistēmai (lietotāji pieraduši vai nelieto to lauku)? Vai ja avota funkcionalitāte atšķiras no avota projektējuma? Kas tas ir? Bugs? Fīča? Todo? (šajā vietā atgādināšu, ka Inokentijs ir prom, bet Maija Saprātiņa, kurai to gabalu piešķīra, ir noslogota ar cita bloka iešanu produkcijā).

 

3) Ko darīt, ja, taisot ko citu, atrod kļūdu sen jau strādājošā DN gabalā? Teiksim, rakņājoties pa ETL, pamana, ka, pārņemot no rēķina preču aprakstus, sasien preces numuru ar apraksta numuru, rezultātā pirmajai precei ir piesaistīti visu rēķina preču pirmie apraksti, otrajai precei – visu preču otrie u.t.t. Kokteilis. Bet vai celt paniku, ja neviens nav sūdzējies? Ej nu zini, cik vietās un pārskatos šis jau iegājis un kā tur nokopts. Kurš metīs ar akmeni, ja DN analītiķis noklusēs, sak, pašam vēl nāksies labot, bet pietiek savā darāmā. Variācija – to taisīja Pēcis, negribas kašķēties. Apakšvariācija: par to atrašanu testētājiem algu maksā, lai viņi ziņo. Apakšapakšvariācija: negribas traucēt, jo varbūt kaut kādu nezināmu iemeslu dēļ tāda bija prasība.

 

4) Kurā vietā taisīt NVL, TRIM u.tml.? Avota datu atlases selektā? Transformācijas gabalā? Pēcapstrādē? Pirms iekļaušanas pārskatā? Vai tos likt visiem laukiem kā standartapstrādi pat tad, ja it kā nevajag?

 

5) Vai dublēt avota biznesa constraintus, teiksim, ka ievadē atļauti ‘1’, ‘2’, ‘3’, vai ņemt neanalizējot. Ir projekti, kuros taisa, lai transformācija izkrīt, ja ir ārpus diapazona. Ir, kur tādus samet speciālā tabulā izskatīšanai. Ir, kur pārņem un neliekas ne zinis. Ir tādi, kur notiek datu kvalitātes kontroles, piemēram, reizi nedēļā laiž kontrolselektu, vai DN nav parādījušies dati ārpus ‘1’, ‘2’, ‘3’. Bet – šādi kontrolselekti ir jāizdomā, jāievieš, jāuztur (!!!!!!!). Un jāzina, kā rīkoties, ja atradīsies ieraksti ar citām vērtībām. Kā saka viens mans draugs, pieredzējis analītiķis, – “nevajag meklēt kļūdu, ja nav zināms, ko ar to iesākt”.

 

Kā ietaupīt uz sirmo matu krāsošanu?

 

Datu noliktavu darba sludinājumos goda vietā ir KOMUNIKABILITĀTE un analītiskās prasmes. Pamatoti. Lai gan ceru, ka pārskatāmā nākotnē arvien vairāk uzņēmumos tiks šīs lietas sakārtotas, lai DN cilvēku komunikabilitātei (lasiet: ja pats mācēs sarunāt, tad (un tikai tad) dabūs korektus avota datus) tomēr nebūtu izšķirošā nozīme, bet avota sistēmu cilvēkiem informācijas sniegšana priekš DN projekta cilvēkiem būs parasts, plānots un kontrolēts darba uzdevums.

 

Tur, kur vēl ir ceļā uz to, manuprāt DN cilvēkiem ir vērts pārzināt,  saprast avota sistēmas datu struktūras un funkcionalitāti. Jo realitāte mēdz būt tāda, ka dokumentācija vairāk nebūs, nekā būs, izstrādātājiem nebūs ne laika, ne vēlēšanās iedziļināties, savukārt iespēja tikt pie pilnīgākās dokumentācijas pasaulē – avota IS izejas koda ir 1) reti kad iespējams luksuss, 2) ne jau vienmēr to kodu DN cilvēks prot brīvi lasīt.

 

Mana pieeja – ja iespējams, atrast veidu, kā padarbināt avota sistēmu, redzēt un pastrādāt, kā dati rodas, kā notiek viss process. Piemēram, ja uz datu noliktavu pārņemami izrakstīto rēķinu dati, tad kā tie rodas, kāds ir to dzīves cikls. Pie reizes jēdzieni kļūs saprotamāki. Ideāli, ja ir pieeja visai dokumentācijai, t.sk. vēsturiskajai. Lasot, kā laikā gaitā sistēmas evolucionējušas, var labāk saprast fonu, problēmas, vājās vietas. Ja ir iespēja ar SQL pie avota bāzes tikt – ņamma!

 

Papildu tai fantastiskajai brīvības sajūtai (kura DN cilvēkam ir, ja māk un tiek klāt rakstīt visādus biznesa selektus avota IS) būs arī priekšstats par DN testēšanu, jo viens no veidiem ir, ka lietotāji atver vienu dokumentu vai sarakstu abās sistēmās un salīdzina. Man ir pieredze, kā lietotājiem patika dokumenti, kurā detaļās, ar attēliem un bultiņām aprakstīts, kuri lauciņi no avota sistēmas formām un sarakstiem atbilst kuriem datu noliktavā. Piemēram, nosaukumi atšķiras (objektīvu iemeslu dēļ, jo var gadīties, ka sen jau jaunas veidlapas, bet vecajā sistēmā nemaina un visi pieraduši, ka laukā “adrese” ir personas kods), taču jaunajos projektos, t.sk. DN lieto jaunos nosaukumus.

 

Nobeigumā vispārināts ieteikums: pirms nosodīt (ja vispār ko tādu darīt), pacensties izprast domu gājienu. Ir bijis, ka sarunu sāc ar “kurš idiots ko tādu izdomāja”, bet pabeidz jau ar “kāds interesants un attapīgs risinājums”! :)

 

PL SQL developer sertifikācijas atmiņu stāsts


Programmēšanas eksāmenos velns ir detaļās (vs lomu sertifikācijās velns ir jēdzienos). Lai iegūtu Oracle PL/SQL Developer Certified Associate sertifikātu, nokārtoju divus eksāmenus –

  • Oracle Database: SQL Fundamentals I
  • Program with PL/SQL.

Abi ir darba ikdienas sastāvdaļa, jo lielos valsts mēroga projektos sistēmanalītiķis nereti ir cilvēks – orķestris, kam tabulu, vaicājumu, apdeitu, skatu, procedūru un trigeru diriģēšana ir ikdiena – Informix, Oracle, MS SQL, Sybase… It kā jau SQL arī Āfrikā ir SQL, tomēr apaudzēts ar fīčām un nav viegli tā uzreiz atcerēties, kurā DBVS varchar2 noteikti jānorāda minimālais garums, kurā pēc noklusēšanas ir 1 un kurā tāda varchar2 vispār nav.

Tātad, lai dokumentāli apliecinātu zināšanas, raitā tempā jādemonstrē kompilatora prasme. ~1.5 min uz jautājumu – jāizlasa, jāsaprot, parasti satur 4 atbilžu variantus, kas katra ir citāda funkciju virkne. Līdz ar to dot slēdzienu par vienu virkni ir ~20 sekundes laika. Jo ātrāk mācēsi noskanēt virkni uz tipiskām kļūdu ķeramvietām, jo vairāk laika paliks pētīt aizdomīgi pareizās:

  • Vai strādās select lower(replace(trim(‘son’ from cust_last_name),’An’,’O’)) from customers?
  • Kas notiek, kad izpilda select initcap(cust_first_name||’’||upper(substr(cust_city,-length(cust_city),2))) from customers?
  • Vai to_number(to_number(prod_price,’$99,999.99’)*.25,’$99,999.00’) atgriezīs $6,509.75?
  • Vai select lpad(substr(cust_name,instr.(cust_name,’’)),length(cust_name)-instr(cust_name,’’),’*’) from customers where instr.(cust_name,’’,-1,2)<>0 atlasīs tikai tādus klientus, kuriem ir 3 vārdi?
  • Vai no sysdate var atņemt pī?

Un tā tālāk. Nācās atsvaidzināt teoriju. Iesaku Ginta Plivnas blogu – izcili mācību materiāli!!!, lasīju, uzslēdzu smadzenes uz mazkompilatora režīmu (nost ar zinātniskās bakstīšanas metodi “hmm, nez, šitāds kompilēsies?” un “tad jau piečibinās, ja šitas neatgriež”), un izvirzīju mērķi uzreiz uzrakstīt pareizi, zīmēju Venna diagrammas uz papīra.

Tehniskie izaicinājumi

Ja lieta ir gadiem lietota, tad lielos tramvajos jau aiziešana līdz eksāmena telpai ir 3/4 no uzvaras. Tomēr izgulēties uz lauriem eksāmenos nevar:

  • Pieraksta izvirtības uz LV neierastiem formātiem – sākot ar tiem $1,234.67, turpinot October 25th DD-MON-RR un visbeidzot AM un PM. Nācās mācīties tos $99G99D00, $9,999V99, fxDay, fmDdspth…
  • Ne visas funkcijas no eksāmena tvēruma ikdienā lietoju, tāpēc mācījos (kā, piemēram, coalesce), ko dara, argumentu skaits, obligātie, defaultie, kādi
  • Argumentu vērtības un uzvedība. Piemēram, ja instr norāda negatīvu virknes garumu, tad izkritīs ar kļūdu, nevis uzskatīs par null, 0 vai 1. Atceroties šo, bija viegli ātri izslēgt pa kādam atbilžu variantam
  • Funkciju virknes – vai, kā eksāmenā tādas mīl… nvl2(coalesce(decode(substr(instr(length(…. un variācijas, piemēram, – vai var joinot uz A = length(nvl(substr(coalesce(B,instr(A,1,2)),B),’null’))? uh, kādā tempā  eksāmenā nācās galvā šitos kompilēt :)
  • Tipu konversijas, kur ir, kādas un kur nav pieļaujamas – decode, nvl, nvl2, kā arī ar tiem (to_number(to_date(to_char(… – ārpus eksāmena ar tiem palīdz F1 un gūgle
  • Kas notiek ar NULLiem dažādās funkcijās (paldies, Gint!), vai var taisīt substr no null un vai nullif (1,null) ir tāds pats, kā nullif(null,1)? Ja concat pieliek null, vai viss rezultāts kļūst null?
  • Aliasi kolonnām, joiniem, groupiem, orderiem un havingiem – kur var lietot, kur nevar, kur vajag, kur vienalga
  • Ķer uz līdzīgām funkcijām – klasika ir months_between, days_between, years_between
  • Daudz JOINu ar visa veida pierakstiem (+), OUTER, ķer uz USING un ON atšķirībām, uz aliasu izmantošanas niansēm

Mācību procesa izaicinājumi

  • Internetā atrodami eksāmena jautājumiem līdzīgi piemēri ar nepareizām atbildēm, tāpēc vai nu jāpārbauda visas atbildes, vai svarīgi vismaz čuja līmenī mācēt nojaust, ka tā varētu nebūt pareiza. Es gandrīz visus piemēra jautājumus izspēlēju savā rotaļlaukumā – pētīju gan pareizo, gan nepareizos
  • Sagūglētiem piemēriem nav atbilžu skaidrojuma. Pateikts, ka pareizā ir “B”, bet kāpēc? Variācija: rakstīts, ka pareizas ir vairākas (A,C,D…), lai gan eksāmenā pareizā var būt tikai viena. Nu tad jāsaprot, kur āķis. Forumos var atrast – lūdzu, paskaidrojiet, kāpēc… Bet ir daudz atbilžu no sērijas “kurš tad ir tāds idiots, ka neredz, ka pareizais protams ir B”
  • Drilltests vietām ir pretrunā gan ar mācību materiāliem, gan praksi. Jau minētā varchar2 gadījumā – drillā pareizā atbilde skaitījās, ka tipa definīcijā var nenorādīt garumu, kamēr materiālos un praktiskajā Oracle krita ar kļūdu, ja nenorāda varchar2 garumu
  • Nebija pieejama precīzi tāda vide, par kādu ir eksaminācija. Tajā, kuru es lietoju, piemēram, varēja kolonnu secību alterēt (līdz ar to prasti jāiegaumē, ka eksāmenā uz “vai var esošai tabulai nomainīt kolonnu secību?”, jāatbild “nē”)
  • Drilltests, kuru dod BDA, ir vieglāks nekā reālais eksāmens. Par to parunāju arī ar BDA cilvēkiem, bet viņi jau neko ietekmēt nevar, tāds tas ir un alles. Tāpēc brīdinu – uz drillu vien nepaļaujieties

Sākumā gāja grūti ar kompilēšanu galvā – nepierasti tomēr. Centība rezultējās ar 91% (pietiktu ar 60% – slieksnis pazems, jo eksāmens IR ķēpīgs).

Notes from Business Analysis Conference Europe’2014


Luckily I had the pleasure to attend one of the most respected and awaited events in Business analyst community – Business Analysis Conference Europe’2014 as one of 416 attendees amongst delegates from UK, Germany, Malaysia, Switzerland, New Zealand, Ukraine and many other countries.

Just to fill you in some background of the audience – more than 370 BAs there – Principal, Lead, Senior, Team lead, Manager; a few nice job titles I noticed: Business analyst mentor, Managing Analyst, Principal Technologist, BA Resource Lead, IT Business analysis expert, Business Analyst Designer, Staff Business analyst, Mobile applications BA. That’s quite impressive BA density per square meter, isn’t it?

And last, but not least – three System analysts. Thanks, Canada and UK for not leaving me alone :) I did a small research and found out that this role has almost disappeared there. BA’s output usually goes to System/Enterprise/Technical architect as input, and architect collaborates with programmers. I believe we here in Latvia will experience the same soon.

Ok, moving on conference. It was a challenge to choose two-day road map when 5 tracks and 40+ interesting presentations there. The questions I had –

  • Is business analysis rebrending itself towards system analysis?
  • Is pretending to be agile really agile? If we have a meeting every day, are we using scrum?
  • What are default values in business analysis nowadays?

So I attended 16 nice events and would like to share some of my impressions and notes. I’ve divided this entry into four main topics – Communications, Agile, Business analysis and life, Experience. First of all, let’s start with warming up. You’ll find answer later, if you are in doubt. Which is BA favorite answer?

  • It’s out of scope
  • Let’s have a meeting about that
  • To be detailed later
  • It depends
  • This will be done automatically

Let me just touch on a BA manifesto I collected:

  • Soft skills über alles. Everything else can be learned
  • Communication – investment vs waste
  • BA must have survival skills and stress management techniques (under pressure, stress etc)
  • Know and do time management
  • Set a goal – every week become a bit more skilled
  • Ask questions! Great BA’s ask – WHY?
  • Understand your values
  • CARE ABOUT YOURSELF. No-one else will – you are a resource

Communication

It’s Not What You Say; it’s What’s Heard, Suzanne Robertson

  • Everything is communication. It’s a lifelong challenge
  • BA’s job is to move ideas from one thick skull to another: users, developers, management…
  • Always invest time to align expectations, share mental model
  • Avoid empty announcements, keep positive thread:
    • Flight delayed!
    • Flight delayed because of bad weather. We don’t know for how long but we will be back in 15 minutes to announce news
  • Make noise free environment. Eliminate or at least reduce any
    • visual noise (too bright picture, rough wall…)
    • audial noise (phones, cars…)
    • kinesthetic noise (smell…)
  • Put chairs straight, keep coffee hot, pencils sharp… or they’ll steal attention away
  • If BA knows a formal modelling language, it is like knowing one more foreign language
  • People write tougher than speak. Call!

 Information is beautiful, David McCandless, journalist

  • Protect yourself and others from lies
  • Single date is not truth. Provide context!
  • $ 100 000 000 cut!!?? Is it a big deal?

cut

Be creative! You have so many dimensions!

  • Placement
  • Colour
  • Size
  • Text
  • Hierarchy
  • Font
  • Shape
  • Motion

creative

Animation–free from static world, Andrew Turner

  • Visualize and animate requirements, events, scenarios
  • A picture is worth a thousand words and one minute of video is equal to 8 million words
  • 59% of executives would rather watch video than read text
  • Viewers retain 95% of a message when they watch it in a video – compared to 10% when reading it in text
  • Ideas when to use animation:
    • Data flow in diagrams
    • Process flow in models
    • Process before/after
    • Data transitions
    • Usability problems, solutions
    • To align expectations with assumptions

Agile

Has the relentless march toward Agile eliminated the need for Business Analysts? Andrew Jacques

  • Software should be Agile unless there is a good reason to go Waterfall
  • Waterfall projects can still use Agile techniques within stages
  • A product owner will be involved full time, and ideally they will come from the business rather than IT
  • We have docs, but it’s not a purpose
  • Requirement management tools + documented code + documentation spread over requirements: f.e. JIRA attached user stories, mockups
  • No «pick solution» thinking. We «buy-in» solutions
  • Agile is not a way to reboot a bad team. Agile needs good team
  • 3 week sprints motivates – everyone sees a real result
  • Industry still has little experience maintaining agile projects

 Transition from Waterfall to Agile Tony Hanton, James Fitzgerald

  • Agile is not a silver bullet
  • Agile is a competency, not a methodology:
    • mindset, not template
    • culture, not methods
  • It’s hard to be agile when the team doesn’t understand the business domain
  • Short sprints motivate :)

Business analysis and life

The Business Analyst’s Identity Crisis Richard Shreeve, Consultancy director

Business analysis is increasingly blurred. While distinct roles yet, many of the same skills and competencies are required

identity

Hints to survive:

  • Standardize on a repository-based tool to promote the sharing and re-use of business, information, application and technology models
  • Use frameworks
  • Create templates
  • Draw on existing models
  • Search for best practices
  • Use software for checking consistency etc.

Enhance your BA through UX! Yuri Vedenin

BA’s, please

  • …live in user’s shoes, not process
  • …start building more human centered products
  • …care more about your users preparing mockups and scenarios

Thriving in a world of change Professor Eddie Obeng

lines

When you were a child you saw a similar puzzle.  In THAT problem the lines were same length.  And you learnt the ‘correct answer’

Correct answers don’t live forever. Be aware of them.

answer

World has changed.

By the way Professor suggests: do NOTHING of NO use!

Play! Georgiana Mannion

  • «You can learn more about a person in an hour of play than you can from a lifetime of conversation»
  • A game helps to get rid of teached answers
  • The opposite of play isn’t work, it’s depression

A better way of working Lambert David

  • ~5 meetings a day, 60% of our time on email – when did you have 3 h uninterrupted work?
  • BA problem: no time to lift up your head and think…
  • There IS a better way! Extract maximum from
    • Physical space
    • Technologies
    • Culture

Besides, it was interesting to learn about a company where everything is designed for BA’s to achieve maximum efficiency. Actually there is no better picture and I use my fast photo just to illustrate the idea:

office

  • How it works there? A new project starts. Team gets it’s place. Stay, move – as your team or you wish. Where would you like work today? Go there! Find a place you feel best today or right now and work there. No reservations required, no limits set
  • You are mobile. IT and data fully equipped everywhere, just plug in
  • Various type of rooms for concentration
    • guaranteed silence, no other people movements near..
    • greenery, single chapels, meditation carpets, yoga…
  • Different rooms for small/medium/large teams, different types for any team:
    • formal meetings, long whiteboard, interactive shared whiteboards, tables, chairs…
    • informal meetings, brainstorming, creative
    • leisure, eating, snacks, recreation, kitchens, free drinks, snacks, food
  • Large open hub where all fit

Looking at the situation from a mortal’s view without having such a perfect office, what can we do TODAY?

  • Don’t wait changes from top. Change yourself and you’ll have followers
  • Change your environment. Make it nice, encouraging
  • Never let environment to eat up you or suppress your talent. Never!!!
  • Organize events
  • Share problems, solve problems
  • Provide feedback
  • Smile, be positive and noticeable

Experiences

Can a BA add value to IS security project? Gillian Walton

  • Traditionally the Information security team is too focused on physical security of devices and less focused on security of information itself….
  • Through a partnership with security, business analysts can play a key role in ensuring adequate security controls are included in the systems requirements
  • Lawfirm’s project; management is a sponsor
  • Customers demand security
  • ~6 months preparing, studying ISO/IEC 27001
  • Business analyst : look at processes, document them and look no breaches possible
    • What to be protected?
    • Where are gaps?
    • Where are potential gaps?
    • Doubled, tripled data and process holders…
    • Communication channels…
    • Spreadsheets, printouts around, back and forth
  • Create culture!
  • that bunch was a hard nut but it was successfully moved to a new IS and later certified to ISO/IEC 27001

       – How do you know there are zero breaches?

       – I hope :) I know the processes and we did our best. Show people you DO care and do your best!

A Business Analyst Journey to a Lead Business Analyst  Isha Jain, UK IT Business Analyst of the Year ’2013

It was a totally amazing presentation! You are my stakeholders right now and I want to be sure that you step out of this room after achieving your objectives. Isha remembered every question (and name of the questioner) and got his or her confirmation after response.

When Isha started her carrier path as a business analyst, she had a lot of questions:

  • What is the scope? Is it replacing system for 200 people across the department or is it more?
  • What will business tell me as their requirements? Do I want to hear it?
    • System Problems?
    • Things they do in spreadsheets? (~100)?
  • Will that give real Value to the programme?
  • Can I be sure that the requirements will be the real business needs?
  • Can I be confident that these requirements will eliminate all inefficiencies in the department and deliver real business benefits?
  • Remember – people’s perception is built from how you position yourself

Naturally she was noticed and promoted. The next move was to define a BA role in the Company. I want a BA to:

  • Find real requirements that give real value
  • Document Requirements
  • Write Business case
  • Write an Investment proposal
  • Create business process models
  • Be bridge between business and IT
  • Help developers to clarify requirements

I got authority -> people started to believe me -> people started to follow me

Before there were 7 branches and a lot of “irreplaceable” analysts. Isha was certainly sure it’s not a sustainable situation and her direction was to BA’s as a single entity (a pool). Now as a senior management Isha:

  • Established framework – tools for req. management, templates, different workshop styles (200 people there)
  • Created BA service catalogue to avoid people body shopping – provide services, not bodies!
  • Recommends the best approach
  • Leading IS and senior leaders through change
  • Identifying best practices in the industry
  • Provide services not just to project but to wider business
  • Bringing visibility to BA Practice, mentoring BAs
  • Maintains performance hubs – measure, improve
  • Building Skills framework/Training catalogue

Some of Isha’s niceties:

  • Think. Ask. Repeat as necessary :)
  • Remember and answer questions. All questions. Always.
  • Proud and happy doing my job
  • Take time out to celebrate Success and Achievements

Now thank you for staying with my blog and, turning to the end of my notes, here come my findings:

  • Is business analysis rebrending itself towards system analysis?
    • No – it takes over the most pleasant part of IT
  • Is pretending to be agile really agile?
    • Seems yes… but not as dogma – great!
  • What are default values in business analysis nowadays?
    • Communication, talented BAs, requirement management tools

And finally – the BA’s favorite answer : it depends!

%d bloggers like this: