Skip to main content

From Data to Understanding

                            PREVIOUS CHAPTER

Chapter 10 | From Data to Understanding



Silicon Valley companies, with the likes of Google leading the charge. The clas- sic anecdote  that  symbolizes their  reliance on data is how Google ran an experiment  to  collect data on what shade of blue (out  of 41 candidates2) would be most appropriate for a button. A designer who left Google because of these types of behavior said it was an indication of the lack of appreciation of intuition and creativity, necessary preconditions for innovation in design.3

Data-driven AI techniques have only exasperated the issue and strengthened the argument of those who call for every decision to be data driven. A favor- ite statement to throw around in these data-obsessed times is that data is the new oil.4 It should be treated  as a commodity that an organization drills for, extracts, transforms, and uses to generate value for itself. Thankfully, at the same time there are more balanced voices that caution us to not adopt such a data-centric view of the world. In this chapter we are going to follow that more balanced approach to how we can think of data within an organization.

The aim of this chapter is to provide some starting points and useful strate- gies about how to  think of data in your own organization. Despite voicing concerns about overly relying on data, I strongly believe that data is crucial. It is a valuable resource  and we need to  treat  it as such. We  should always, however, question and challenge the notion that data is valuable intrinsically, outside of a specific context and purpose. This will ensure that what effort we put into data management and curation is directed and efficient.


Giving Data Purpose
In considering the purpose of data it is useful to take a look back at how we data got elevated to be considered the most valuable commodity, and how that position is now being reconsidered or, at least, should be reconsidered.

Being able to store  information has always been useful. However, for a long time the potential opportunities were tempered  by the cost associated with storing and dealing with data. When data storage was counted in a few mega- bytes and the devices to handle it took up entire floors in company buildings, you had to carefully consider why you were doing it. As such, retaining data was something most organizations would avoid beyond what was necessary to enable the proper functioning of the organization. The purpose of data had to be very specific, because holding on to it and usefully exploiting it had a very noticeable cost.




2 www.nytimes.com/2009/03/01/business/01marissa.html.
3 https://stopdesign.com/archive/2009/03/20/goodbye-google.html.
4 www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource- is-no-longer-oil-but-data.
The AI-Powered Workplace


133



As our capability to store data increased and the cost of doing so decreased, the attitude shifted to one where we would simply avoid the question of what pur- pose and value specific pieces of data were bringing. We started using terms such as “raw” data. Instead of storing summaries and aggregations we would store every single piece of data. Organizations started  shifting into a mindset where they would store  as much of it as possible, with the thinking being “you never know how it might be useful in the future, so best to hold on to it.” This is also evident in the terminology we use. We purchase “data warehouses” where we keep and organize data, or “data lakes” where we just let data flow in like into a large dam; to be collected and used at some later point. We argue about what is “big data” and what is just “data,” with some experts relishing in explaining that what most people think of as a lot of data is simply not that much. We celebrate companies that make no profit because they are accumulating useful data. We talk about how we “strongly believe” that data will prove valuable.

Over  the past couple of years, however, cracks are beginning to  appear in data’s absolute reign. Yes, we can simply store  huge amounts of data at mar- ginal extra  cost, but we recognize there  are other  challenges. Storing data does not carry a technical cost but it carries a regulatory and risk cost. With regulations such as the  GDPR5  in place, all organizations are forced to  be mindful of why and how they are keeping data. They cannot simply dump it somewhere  “just in case” it might at some point be useful. Furthermore,  as consumers are getting increasingly more wary of the risks of cybercrime or other  forms of digital manipulation (e.g., shaping opinions for political gain), they are looking for organizations that can prove they are worthy custodians of their data. Data carries with it opportunity but it can also be a liability, so clearly defining the purpose of data is once more important.

Now, if holding  on to data needs to serve a purpose, what is that purpose in the context of AI-powered applications? Well, to start with we need to define what sort of AI-powered applications we are looking to build. What decision- making do we want to  delegate to  machines and what model of how the world works will we need to have in place in order  to enable that decision- making? Based on the model we need and the way in which we can go about creating it, we can figure out what types of data we should be using.

As we discussed in Chapter 3 there are two ways for us to uncover the neces- sary decision-making model that we would need to implement in order  to automate processes. We can use our own knowledge of a domain to develop hypotheses and create  appropriate  models of behavior, or we can look for patterns  in data and use algorithms that  will explore that  data in order  to discover a possible model.

5 GDPR is the EU General Data Protection Regulation, which represents  a step change in data regulation in Europe. It describes how organizations are to treat user data and penal- ties they may face in the case of a data breach or a failing on their side to properly handle user data.
134   Chapter 10 | From Data to Understanding



For the latter, data serves a very clear purpose. Data is the terrain that we explore to uncover correlations. If we want to be able to build a document classifier of different types of documents, we need lots of examples of those different types of documents. If we want to uncover patterns of behavior for our teams, clients, or partners, we need data that captures that behavior. But does this mean that we should store every little piece of data potentially avail- able to us?

That can be quite a hard question to answer outside of a specific context. My guideline right now would be to err on the side of caution and only increase your data retention  capabilities alongside an increased maturity of your data governance capabilities. In other words, if you are just starting out and explor- ing how you could use AI techniques, don’t start  by mandating that  every single piece of data should be stored  for potential future exploitation, no matter how tempting that may be. Start by building out structures  within the organization that can ensure that you have proper data governance in place. Start by building your confidence that you are doing the right things when it comes to  employees and clients and how their  data is treated.  The more confident you are about how well data is managed, the bolder you can be about how much data is stored.

For model-driven techniques it may feel that data is less of an important ele- ment. We are, after all, not going to require huge amounts of data in order to build our models. Don’t forget, however, that those model-driven techniques are there  to manipulate data. They may require less data, but the data they will need requires more structure.  For example, let’s suppose you’ve built an ontology that  describes how the  different types of products  you produce relate to each other  and the skills that are required to develop them or the materials that are required to produce them. In order to use this ontology to power queries, you would need a database where these different pieces of information are appropriately stored  and annotated. To achieve this you will need  a more  disciplined approach  to  data  management that  ensures,  for example, that as data is updated it continues to conform to your structure’s data needs.

Whatever  AI technique you use, the ability to govern data is crucial. To get that in place you need a data strategy. In the next section we look at some practical methods to employ.


Developing a Data Strategy—from Theory to Practice
Something that I often come across as I sit down with teams to advise them on their data strategy is that initial conversations or workshops are split into two phases.
The AI-Powered Workplace


135



There  is the  first phase, where  the  “important”  stakeholders  like to  get involved and dedicate some of their precious time. Here  we get to  define vision, goals, and “high-level” strategy. Things are upbeat in this phase. The world is going to be amazing and everyone is excited.

Then in the practical second phase, once “they” leave the room, the practi- calities of getting things done start  being discussed. Here  things quickly go downhill. In organizations that are not tech-driven and transparent  in their methods from the start, simple tasks can become incredibly challenging. This is not because of lack of will but because of how complex it is to navigate the various layers of responsibility and departmentalization of data in order to get to something useful.

I will work through an example as a way to highlight some of these issues and put in place strategies that are cognizant of these.


The Challenge
The company is a large engineering company that builds complex widgets for clients. It spans several geographical sites and services multiple industries. It has highly specialized  personnel, and projects are considered mission-critical for clients. We are looking to build a tool that will support project managers in doing the  initial planning around putting together  teams, schedules, and raising any other considerations that may be relevant as the company is gear- ing up to take on a new project.

Projects always require a variety of different skills, are expected to span sev- eral months, and will occupy different roles in different ways throughout their lifetime.

Given just this brief outline, if you had to do some initial planning for a chal- lenge like this, what types of knowledge would you need? Here is my tentative list:

A list of who is in the organization and what their skills and past experiences are—this will allow us to figure out who might be a good fit for a given project.

A view into their day-to-day calendars in order to set up some initial exploratory meetings

A view into their  longer term  commitments  to  under- stand  how  viable their  participation in the  project  is. There are bound to be several combinations of suitable teams, so we need to have the teams discuss and square that against their long-term commitments.
136   Chapter 10 | From Data to Understanding



The ability to search through past projects on the same domain for  any potential  areas  of concerns—perhaps something someone documented in the past that could inform our current actions

This is, of course, a very high-level list but it gives us enough to explore what strategies we can put in place to tackle the task from a data perspective. Let us take each requirement in turn and explore what demands it places on our data capabilities.


A Data Strategy in Support of Communication and Collaboration

In order to know who is in the organization and what their skills are, we need an up-to-date database of people in the organization. That is obvious enough and one would think it should never be a big ask for any organization. A list should exist somewhere! Everyone presumably gets paid at the end of the month so, worst-case scenario, Finance/HR  will have something.

Well, the list does exist but, as it turns out, it is not accessible in a way that you can build an application that uses it. The only way you would get it would be in printed format. It turns out that the HR system is quite outdated  and only works on their machines. The IT department  absolutely refuses to pro- vide any access, as that would require skills that they simply don’t have any- more. They cannot guarantee the safety of data if they start tinkering with the system, so they simply leave it as it is. It’s not all bad news though. We learn that HR is planning to replace the current system with a new one—the new system is going to be rolled out in about 6 months.

Being the optimist that you are, you see that as an opportunity. “Fantastic,” you think, “let us make sure that the new system is going to be able to provide us with rich profiles of the people within the company so that we can build AI applications that  can answer  complex questions  about  who is able to  do what.” Nice idea, but it turns out that HR has finished the requirements spec with  the  vendor,  and  they  can’t  change what  was already planned now. Contracts  have been signed and development has started.

You are not one who gives up easily though. You push back, get people in meetings, and passionately argue your point. You explain how this system is going to  hold data that  is valuable to  the entire  company, not  just for  HR purposes. You argue for strategic thinking that ensures that the company is thinking about data in a more general sense and not just within the context of specific projects. You explain how every system that is the source of data needs to enable other systems to easily access that data.
The AI-Powered Workplace


137



Finally, it clicks for everyone. A more general vision and strategy for data is required and should be developed in support of the organizational vision and strategy. Organizational goals such as “renew HR capabilities” and “improve project planning” are not isolated. Their data needs intersect.



■    A principle for data management should be that data is never locked into a single system; it has to be accessible by other systems.


Having multiple parties being able to access the same data is a great step in the right direction. Ensuring that that data is always accessible to your orga- nization even as you may change vendors of software is a sensible insurance policy always, irrespective of your AI ambitions.

Coupled to ensuring that access to data is possible is ensuring that the way data is described is widely usable. Teams should work together  to develop common models of how to describe key aspects of organizational behavior such as roles and skills of people. That starts giving you superpowers. It can be a point of differentiation from the competition because you have a better handle on all the different things your teams can achieve. The better  your knowledge   representation    capabilities,  the    more    sophisticated    your AI-powered planning tool can be. To achieve that you need teams that are all switched on to  the power  of data and talk across team and departmental boundaries to coordinate and uncover opportunities. It is not a one-off thing either. It requires a culture change that ensures that these conversations keep happening. Just as everything else is subject to change, so are the ways you will use to describe things.



■    Make sure that discussions around the handling and description of data happen consistently across the entire organization, to enable you to capture opportunities and identify where multiple stakeholders should have a say into how data is described.



A Data Strategy That Enables Connection and Aggregation

The next challenge is about accessing people’s calendars and understanding their day-to-day commitments. Your optimistic side kept telling you that this shouldn’t be a huge problem. Calendars are “standardized” technology, so surely you would be able to simply get access to it all.
138   Chapter 10 | From Data to Understanding



However, it seems that calendars aren’t quite as standard as one would think. The company recently acquired two smaller companies and their infrastruc- ture  is very different. They don’t allow any “external”  systems to  access calendar information. You discuss the possibility of moving them to the com- mon platform, but that triggers a cascade of dependencies on other  systems they should update. There is no easy short-term  solution. It will take time for everyone to align on the same systems. At this point you are faced with two options: give up on being able to integrate calendar data or tackle the prob- lem in a different way.

Giving up, as you’ve already proved, is not what you do. You argue for building dedicated tools whose only task will be to act as a bridge between the two systems. They will extract information from the internal calendar systems and provide a safe way for external systems to access that information and make it more widely available.

This was another aha moment for the wider organization and something that can form another pillar of the overall data strategy. Sometimes change cannot be forced through alignment on the tooling used. It isn’t because of unwilling- ness to  do so. Change sometimes  need time and there  are only so many things that you can change at once. A new acquisition has a number of issues to work through. Not having to worry about upgrading their calendar systems immediately might just be the point to let go.

A better  way to deal with the issue is to adapt to the situation with tooling that allows you to transform and aggregate data. It will be more costly in the short term, and it will mean you are building tooling that you will eventually throw  away. But you have to balance that against the organizational cost of forcing people to move too quickly on too many fronts, and the opportunity cost of not having joined up data as early as possible.



■    Better handling of data also means supporting the creation of tooling with the explicit pur- pose of extracting and transforming data. At times it is simpler to remove the data management problem from the department, team, or specific software that holds the data and instead build a tool on top of that.



A Data Strategy to Unify Information through Standards

The next  set  of requirements  is around  understanding people’s long-term commitments, both historically and in the future. Once more, the optimist in you was hoping to find a single centralized tool everyone uses to plan their schedule that would give you all the information in an easy to understand way. Reality, as ever, is very messy. There is no single way that people manage their
The AI-Powered Workplace


139



availability and commitments. Project teams tend to pick a tool for a specific project, and there is a lot of heterogeneity across teams and across projects.

The challenge this creates with data management is that data is lost as it is spread  across  a variety of different tools,  and there  is no overall unifying approach to it. At the same time, it is important that teams are empowered to  choose  the best tools they want for  specific projects  rather  than being forced to a single tool that may not always be the best fit. Do you force a single tool and eventually get the data you need but risk upsetting the teams, or do you resign to the fact that you will not have access to this data?

This is where standards can play a role. You don’t have to impose a top-down solution; you can simply mandate that certain capabilities should be met by any tool that project teams pick. For example, in the case of planning and resource allocation data, a standard could stipulate that any tool used to capture where time is spent is able to export  its data in a format that provides the informa- tion the organization needs to plan. This means you would have to define a format (even something as simple as an Excel spreadsheet  with specific fields can do the job) and provide teams with guidelines and ways to check that their chosen tool would conform with and output to that standard.

Introducing standards across a variety of activities allows teams to still move with a certain degree of autonomy, creates a useful discussion around what should and shouldn’t be included in the standard, and has the effect of getting everyone considering the impact of storing data in specific types of tools.



■    There is value is being able to change tooling and meet the specific needs of a project
or team, but there is also value in consistency. Standards provide a way to navigate those two aspects.



A Data strategy to Iteratively Improve Tooling

The final requirement is around being able to develop the company’s capability to extract  learnings from past projects in order  to be able to inform future projects.

There is a trove of information about past projects, but it is all spread across different systems and there is no way to search across everything. In addition, there are different file formats and different conventions used within the files. We would need to collect it all together,  place it in a search engine of some form, and build an interface that would allow us to search through it effec- tively. Furthermore, in order for this search to be efficient, we need to be able to search the documents using generic terms that are appropriately expanded
140   Chapter 10 | From Data to Understanding



to terms that may be related to what we need but that we are not necessarily aware of.

For example, if someone searches for projects that involved “metal-joining” the search engine should be able to expand that to “soldering,” “brazing,” and “flux,” since those are related terms. To achieve this, you would have to settle on a company-wide way of describing  all these different things and then build tools that would classify documents appropriately so as to power search.

At this point everyone is starting to realize that this is not going to be a simple task, and they are baulking at the potential scope and cost of it. Once more you need to gather the various stakeholders and produce a convincing plan that will keep everyone on board. The key here is to break the problem down into manageable phases, with each step delivering value so that it can justify the next step of investment. In the best tradition of start-up culture, you need to think big, start small, and scale quickly.

You explain how you can start  quickly by collecting data behind a standard search engine. It may not provide the full benefit of an NLP-powered engine, but it is a more  manageable task that  will deliver immediate value. At the same time, a cross-functional team is created to start developing appropriate models of the type of knowledge that  a more  powerful search engine can start  manipulating.  Finally, company guidelines can be developed to  ensure that every project adds to their list of debriefing tasks the transfer of project knowledge into datastores  that can be accessed by the search engine.

At a second phase you can start applying automation via the introduction of NLP to better  understand the types of documents available and make search better  for users. You will integrate this functionality with your collaboration environment so that people can run searches through natural language ques- tions. It is now starting to look like a fun project to do and one that is actually manageable. While challenges  will always exist, there is less initial fear to get things started.



■    Data needs to be prepared in order for it to be useful, and projects need to be broken down into iterative steps that provide value at each phase. As with any new venture, take a think big, start small, and scale fast approach to problem solving.



From Data to Understanding
Ultimately, everything that we do around data is there to lead to understand- ing. Nobody should be proud of how many terabytes of information they have stored  and how much computing power  they are expending to  analyze it.
The AI-Powered Workplace


141



There is no intrinsic value in that. There is only value in your capability to use data in a way that will provide you with insight and enable you to get things done.

In this chapter  we looked at both the high-level purpose  of data for data- driven and model-driven techniques, and discussed how increased use of data to  power  automated  decision-making has to  be accompanied by increased confidence in data governance.

Finally, we looked at a few different strategies you can employ in order  to solve real-world problems. Deciding to tackle a single problem and using that to  inform your wider organizational strategy is a great way to  get started. Learnings will always vary from team to team, and it is important to gain prac- tical experience. These bottom-up  learnings tied to top-down willingness to invest can provide a winning approach.

Once  a first  problem is successfully tackled, the  whole organization gains more confidence and the process can be repeated  until it becomes part of standard procedure rather than a one-off experiment. It is only then that real transformation happens, when the process is embedded in the culture and is no longer a novelty activity.
C H A P T E R
11







Defining an AI
Strategy


In 2017 at the annual Google I/O conference, Sundar Pichai, Google’s CEO, stood up on stage and announced that the opportunities afforded by AI were such that Google was moving from a mobile-first to an AI-first strategy. That statement really brought the message home to many about the importance of AI and the need to formulate an AI strategy. Google, the technology behe- moth, was turning into a full-blown  AI company. They had alluded to it in the past; everyone knew that Google was heavily invested in AI and using it exten- sively, but this was different. It left no space for doubt. AI-first.

If we stop for a second and analyze it, what does the move from mobile-first to AI-first really mean though? Well, the mobile-first strategy was one that recognized that  the  primary way people access digital services is through mobile devices. As such, any product  Google produced needed to work on mobiles, first and foremost. A desktop-only product was simply not an option. Google’s hypothesis behind the mobile-first strategy was that if it provided the  smoothest  and fastest  mobile experience,  users  would flock to  their products.  The  challenge was  that  mobile-first development  at  the  time required a focused effort and presented  technical challenges. It needed sup- port  from the  top,  as it created  additional product  development resource demands. That Google strategy soon became the norm across technology companies. These days it is not  a strategy to  say that  you are mobile-first.



NEXT CHAPTER

Comments

Popular posts from this blog

What is BLACK WINDOWS 10 V2 windows based penetration testing os,and what are its features.Download link inside

                         Black Windows 10 V2

Mechatronics notes btech. GATE Notes.