Big Data has sent you a friend request. Accept or Ignore?

Do you know there is a continuing education program – Big Data analytics Lielo datu analītiķa modulis – tālākizglītības iespēja IT profesionāļiem in Latvian University Computer Science faculty? Moreover, this program was introduced in cooperation with Accenture Latvia as a subprogram from Computer Science master studies.
Moduļa izveide ir Accenture Latvia iniciatīva un nepieciešamība pēc tālākizglītības iespējām esošajiem darbiniekiem.”

I will drop in here sometimes to reflect on my journey and lessons learned.

Why am I there?

Because I want to learn. I am relational databases and SQL expert and I hope as well map my current skills as to improve in Big Data area. I believe we have no choice to turn away and pretend Big Data do not apply to us. At the moment, less than 0.5% of all data is ever analyzed and used, just imagine the potential here.

After all this is so interesting turn in existence of mankind how have we faced that greedy data burst over us and I have chosen to domesticate that big data beast instead of fearing AI will steal my job.

Am I scared of lack of capacity? Of course, I am! Do I have plenty of free time? Oh, you can trust me, full time developer and mother-of-three and servant-of-three-cats, I haven’t! This is a very complex and time consuming program, I must attend and pass exams for six courses within 2 years from statistics to predicting algorithms, from Hadoop to R and neural networks. Each lecture lasts 16:30 – 19:45 (heyahh, nearest Narvesen, are you ready to sell a lot of coffee?). This semester once a week, next semester – twice a week.

I saw this year we are about seven newbies – wannabe big data analysts from various IT companies there. Let’s see survival rate. Fingers crossed for all of us!

So, I attended
the first lecture
of possibly the easiest course of six in total – Data processing systems. Professor tempts us with benefits of completing this module and uses a lot of buzzwords – NoSQL, HADOOP, ACID, BASE, CAP theorem, map reduce etc. We will have tests, quizzes and practice to set our system on our choice. Professor also reveals that previous year students ran restaurant’s and fitness clubs in their deliverables. Also, he will randomly pair students to force their systems integration. We will have to use our fantasy – like restaurant menu to be accompanied with a personalised offer from fitness club for best burn calories eaten.

Professor in action

He jokes we will experience a common real life situation when MY system is perfect and that terrible pain when you need to integrate that MY very perfect one with this totally horrible ‘other system’.

After intro about course prerequisites etc we had a warm-up session to illustrate why there is an explosive demand of Big Data analysts. By the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet. Accumulated digital universe to around 44 zettabytes or 44 trillion gigabytes.

Why do we have data amount increasing so fast?

I bet you can’t even imagine all the different ways there are for data to be collected – some to be used right now for a known purpose and much of them are accumulated to reveal their potential later (like Facebook finds more and more ways to use their fabulous data collection) and most likely to use as a basis to be fed into artificial intelligence.

XKEYSCORE is a formerly secret computer system first used by the United States National Security Agency for searching and analyzing global Internet data, which it collects on a daily basis. The program’s purpose was publicly revealed in July 2013 by Edward Snowden. Content for 3 to 5 days and metadata for 30 to 45 days tracking someone’s Internet usage as easy as entering an email address. The amount of data per day is 20+ terabytes.

Creative businesses like Turnstyle are gathering location data from the pings and signals smartphones are giving off when Wi-Fi or Bluetooth is turned on. To track that consumer’s exact path and timing, do they linger by the shoe rack and after that decide to head to espresso bar. These data are sold – isn’t it nice to see where people are crowding or avoiding?

Browsers, applications, smartphones, smartwatches, wearable devices, vacuum cleaners and lawn movers, fridges and car navigation systems – all they are sending data to their base station. Amount of devices and users is growing very fast.

Video cameras and drones, car recorders and health checks, robots for cleaning oil pipelines, buyer’s habits in supermarkets and student’s grades, taxes and account operations, selfies taken and emails sent, phone calls and google searches, Candy Crush moves and self-driving cars, augmented and virtual realities…

Everywhere around is data and data about data. If we do not use them to make our lives better right now, we are wasting that data resource, letting their power to stay idle and downtime.

Challenges humans face

  • How to learn to use this tremendous amount of big data?
  • How to extract value from big data? Imagine a doctor’s office having stream of his patient heart rate, oxygen level and hundreds of other measurements. How to detect anomalies in data and trigger his attention?
  • How to know what value can we extract from big data we have? (remember TurnStyle and consumer path from wifi; remember Facebook anniversaries greetings…)
  • How to search within big data? (remember a sad story when a path was restored based on a small shoe fragment in one of camera recordings)
  • How to store big data?
  • How to update big data?
  • And, of course, quite painful issue of data protection. International Data Corporation IDC estimates that about 40% of data requires some level of security, from privacy protection to full-encryption “lockdown.” Unfortunately, from these 40% less than a half actually has protection.

This blog is solely my personal reflections.
Any link I share and any piece I write is my interpretation and may be my added value by googling to understand the topic better.
This is neither a formal review nor requested feedback and not a complete study material.

Mans viedoklis:

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Mainīt )

Google photo

You are commenting using your Google account. Log Out /  Mainīt )

Twitter picture

You are commenting using your Twitter account. Log Out /  Mainīt )

Facebook photo

You are commenting using your Facebook account. Log Out /  Mainīt )

Connecting to %s

%d bloggers like this: