From Data to Understanding

PREVIOUS CHAPTER

Chapter 10 | From Data to Understanding

Silicon Valley companies, with the likes of Google leading the charge. The clas- sic anecdote that symbolizes their reliance on data is how Google ran an experiment to collect data on what shade of blue (out of 41 candidates2) would be most appropriate for a button. A designer who left Google because of these types of behavior said it was an indication of the lack of appreciation of intuition and creativity, necessary preconditions for innovation in design.3

Data-driven AI techniques have only exasperated the issue and strengthened the argument of those who call for every decision to be data driven. A favor- ite statement to throw around in these data-obsessed times is that data is the new oil.4 It should be treated as a commodity that an organization drills for, extracts, transforms, and uses to generate value for itself. Thankfully, at the same time there are more balanced voices that caution us to not adopt such a data-centric view of the world. In this chapter we are going to follow that more balanced approach to how we can think of data within an organization.

The aim of this chapter is to provide some starting points and useful strate- gies about how to think of data in your own organization. Despite voicing concerns about overly relying on data, I strongly believe that data is crucial. It is a valuable resource and we need to treat it as such. We should always, however, question and challenge the notion that data is valuable intrinsically, outside of a specific context and purpose. This will ensure that what effort we put into data management and curation is directed and efficient.

Giving Data Purpose
In considering the purpose of data it is useful to take a look back at how we data got elevated to be considered the most valuable commodity, and how that position is now being reconsidered or, at least, should be reconsidered.

Being able to store information has always been useful. However, for a long time the potential opportunities were tempered by the cost associated with storing and dealing with data. When data storage was counted in a few mega- bytes and the devices to handle it took up entire floors in company buildings, you had to carefully consider why you were doing it. As such, retaining data was something most organizations would avoid beyond what was necessary to enable the proper functioning of the organization. The purpose of data had to be very specific, because holding on to it and usefully exploiting it had a very noticeable cost.

2 www.nytimes.com/2009/03/01/business/01marissa.html.
3 https://stopdesign.com/archive/2009/03/20/goodbye-google.html.
4 www.economist.com/leaders/2017/05/06/the-worlds-most-valuable-resource- is-no-longer-oil-but-data.
The AI-Powered Workplace

133

As our capability to store data increased and the cost of doing so decreased, the attitude shifted to one where we would simply avoid the question of what pur- pose and value specific pieces of data were bringing. We started using terms such as “raw” data. Instead of storing summaries and aggregations we would store every single piece of data. Organizations started shifting into a mindset where they would store as much of it as possible, with the thinking being “you never know how it might be useful in the future, so best to hold on to it.” This is also evident in the terminology we use. We purchase “data warehouses” where we keep and organize data, or “data lakes” where we just let data flow in like into a large dam; to be collected and used at some later point. We argue about what is “big data” and what is just “data,” with some experts relishing in explaining that what most people think of as a lot of data is simply not that much. We celebrate companies that make no profit because they are accumulating useful data. We talk about how we “strongly believe” that data will prove valuable.

Over the past couple of years, however, cracks are beginning to appear in data’s absolute reign. Yes, we can simply store huge amounts of data at mar- ginal extra cost, but we recognize there are other challenges. Storing data does not carry a technical cost but it carries a regulatory and risk cost. With regulations such as the GDPR5 in place, all organizations are forced to be mindful of why and how they are keeping data. They cannot simply dump it somewhere “just in case” it might at some point be useful. Furthermore, as consumers are getting increasingly more wary of the risks of cybercrime or other forms of digital manipulation (e.g., shaping opinions for political gain), they are looking for organizations that can prove they are worthy custodians of their data. Data carries with it opportunity but it can also be a liability, so clearly defining the purpose of data is once more important.

Now, if holding on to data needs to serve a purpose, what is that purpose in the context of AI-powered applications? Well, to start with we need to define what sort of AI-powered applications we are looking to build. What decision- making do we want to delegate to machines and what model of how the world works will we need to have in place in order to enable that decision- making? Based on the model we need and the way in which we can go about creating it, we can figure out what types of data we should be using.

As we discussed in Chapter 3 there are two ways for us to uncover the neces- sary decision-making model that we would need to implement in order to automate processes. We can use our own knowledge of a domain to develop hypotheses and create appropriate models of behavior, or we can look for patterns in data and use algorithms that will explore that data in order to discover a possible model.

5 GDPR is the EU General Data Protection Regulation, which represents a step change in data regulation in Europe. It describes how organizations are to treat user data and penal- ties they may face in the case of a data breach or a failing on their side to properly handle user data.
134 Chapter 10 | From Data to Understanding

For the latter, data serves a very clear purpose. Data is the terrain that we explore to uncover correlations. If we want to be able to build a document classifier of different types of documents, we need lots of examples of those different types of documents. If we want to uncover patterns of behavior for our teams, clients, or partners, we need data that captures that behavior. But does this mean that we should store every little piece of data potentially avail- able to us?

That can be quite a hard question to answer outside of a specific context. My guideline right now would be to err on the side of caution and only increase your data retention capabilities alongside an increased maturity of your data governance capabilities. In other words, if you are just starting out and explor- ing how you could use AI techniques, don’t start by mandating that every single piece of data should be stored for potential future exploitation, no matter how tempting that may be. Start by building out structures within the organization that can ensure that you have proper data governance in place. Start by building your confidence that you are doing the right things when it comes to employees and clients and how their data is treated. The more confident you are about how well data is managed, the bolder you can be about how much data is stored.

For model-driven techniques it may feel that data is less of an important ele- ment. We are, after all, not going to require huge amounts of data in order to build our models. Don’t forget, however, that those model-driven techniques are there to manipulate data. They may require less data, but the data they will need requires more structure. For example, let’s suppose you’ve built an ontology that describes how the different types of products you produce relate to each other and the skills that are required to develop them or the materials that are required to produce them. In order to use this ontology to power queries, you would need a database where these different pieces of information are appropriately stored and annotated. To achieve this you will need a more disciplined approach to data management that ensures, for example, that as data is updated it continues to conform to your structure’s data needs.

Whatever AI technique you use, the ability to govern data is crucial. To get that in place you need a data strategy. In the next section we look at some practical methods to employ.

Developing a Data Strategy—from Theory to Practice
Something that I often come across as I sit down with teams to advise them on their data strategy is that initial conversations or workshops are split into two phases.
The AI-Powered Workplace

135

There is the first phase, where the “important” stakeholders like to get involved and dedicate some of their precious time. Here we get to define vision, goals, and “high-level” strategy. Things are upbeat in this phase. The world is going to be amazing and everyone is excited.

Then in the practical second phase, once “they” leave the room, the practi- calities of getting things done start being discussed. Here things quickly go downhill. In organizations that are not tech-driven and transparent in their methods from the start, simple tasks can become incredibly challenging. This is not because of lack of will but because of how complex it is to navigate the various layers of responsibility and departmentalization of data in order to get to something useful.

I will work through an example as a way to highlight some of these issues and put in place strategies that are cognizant of these.

The Challenge
The company is a large engineering company that builds complex widgets for clients. It spans several geographical sites and services multiple industries. It has highly specialized personnel, and projects are considered mission-critical for clients. We are looking to build a tool that will support project managers in doing the initial planning around putting together teams, schedules, and raising any other considerations that may be relevant as the company is gear- ing up to take on a new project.

Projects always require a variety of different skills, are expected to span sev- eral months, and will occupy different roles in different ways throughout their lifetime.

Given just this brief outline, if you had to do some initial planning for a chal- lenge like this, what types of knowledge would you need? Here is my tentative list:

• A list of who is in the organization and what their skills and past experiences are—this will allow us to figure out who might be a good fit for a given project.

• A view into their day-to-day calendars in order to set up some initial exploratory meetings

• A view into their longer term commitments to under- stand how viable their participation in the project is. There are bound to be several combinations of suitable teams, so we need to have the teams discuss and square that against their long-term commitments.
136 Chapter 10 | From Data to Understanding

• The ability to search through past projects on the same domain for any potential areas of concerns—perhaps something someone documented in the past that could inform our current actions

This is, of course, a very high-level list but it gives us enough to explore what strategies we can put in place to tackle the task from a data perspective. Let us take each requirement in turn and explore what demands it places on our data capabilities.

A Data Strategy in Support of Communication and Collaboration

In order to know who is in the organization and what their skills are, we need an up-to-date database of people in the organization. That is obvious enough and one would think it should never be a big ask for any organization. A list should exist somewhere! Everyone presumably gets paid at the end of the month so, worst-case scenario, Finance/HR will have something.

Well, the list does exist but, as it turns out, it is not accessible in a way that you can build an application that uses it. The only way you would get it would be in printed format. It turns out that the HR system is quite outdated and only works on their machines. The IT department absolutely refuses to pro- vide any access, as that would require skills that they simply don’t have any- more. They cannot guarantee the safety of data if they start tinkering with the system, so they simply leave it as it is. It’s not all bad news though. We learn that HR is planning to replace the current system with a new one—the new system is going to be rolled out in about 6 months.

Being the optimist that you are, you see that as an opportunity. “Fantastic,” you think, “let us make sure that the new system is going to be able to provide us with rich profiles of the people within the company so that we can build AI applications that can answer complex questions about who is able to do what.” Nice idea, but it turns out that HR has finished the requirements spec with the vendor, and they can’t change what was already planned now. Contracts have been signed and development has started.

You are not one who gives up easily though. You push back, get people in meetings, and passionately argue your point. You explain how this system is going to hold data that is valuable to the entire company, not just for HR purposes. You argue for strategic thinking that ensures that the company is thinking about data in a more general sense and not just within the context of specific projects. You explain how every system that is the source of data needs to enable other systems to easily access that data.
The AI-Powered Workplace

137

Finally, it clicks for everyone. A more general vision and strategy for data is required and should be developed in support of the organizational vision and strategy. Organizational goals such as “renew HR capabilities” and “improve project planning” are not isolated. Their data needs intersect.

■ A principle for data management should be that data is never locked into a single system; it has to be accessible by other systems.

Having multiple parties being able to access the same data is a great step in the right direction. Ensuring that that data is always accessible to your orga- nization even as you may change vendors of software is a sensible insurance policy always, irrespective of your AI ambitions.

Coupled to ensuring that access to data is possible is ensuring that the way data is described is widely usable. Teams should work together to develop common models of how to describe key aspects of organizational behavior such as roles and skills of people. That starts giving you superpowers. It can be a point of differentiation from the competition because you have a better handle on all the different things your teams can achieve. The better your knowledge representation capabilities, the more sophisticated your AI-powered planning tool can be. To achieve that you need teams that are all switched on to the power of data and talk across team and departmental boundaries to coordinate and uncover opportunities. It is not a one-off thing either. It requires a culture change that ensures that these conversations keep happening. Just as everything else is subject to change, so are the ways you will use to describe things.

■ Make sure that discussions around the handling and description of data happen consistently across the entire organization, to enable you to capture opportunities and identify where multiple stakeholders should have a say into how data is described.

A Data Strategy That Enables Connection and Aggregation

The next challenge is about accessing people’s calendars and understanding their day-to-day commitments. Your optimistic side kept telling you that this shouldn’t be a huge problem. Calendars are “standardized” technology, so surely you would be able to simply get access to it all.
138 Chapter 10 | From Data to Understanding

However, it seems that calendars aren’t quite as standard as one would think. The company recently acquired two smaller companies and their infrastruc- ture is very different. They don’t allow any “external” systems to access calendar information. You discuss the possibility of moving them to the com- mon platform, but that triggers a cascade of dependencies on other systems they should update. There is no easy short-term solution. It will take time for everyone to align on the same systems. At this point you are faced with two options: give up on being able to integrate calendar data or tackle the prob- lem in a different way.

Giving up, as you’ve already proved, is not what you do. You argue for building dedicated tools whose only task will be to act as a bridge between the two systems. They will extract information from the internal calendar systems and provide a safe way for external systems to access that information and make it more widely available.

This was another aha moment for the wider organization and something that can form another pillar of the overall data strategy. Sometimes change cannot be forced through alignment on the tooling used. It isn’t because of unwilling- ness to do so. Change sometimes need time and there are only so many things that you can change at once. A new acquisition has a number of issues to work through. Not having to worry about upgrading their calendar systems immediately might just be the point to let go.

A better way to deal with the issue is to adapt to the situation with tooling that allows you to transform and aggregate data. It will be more costly in the short term, and it will mean you are building tooling that you will eventually throw away. But you have to balance that against the organizational cost of forcing people to move too quickly on too many fronts, and the opportunity cost of not having joined up data as early as possible.

■ Better handling of data also means supporting the creation of tooling with the explicit pur- pose of extracting and transforming data. At times it is simpler to remove the data management problem from the department, team, or specific software that holds the data and instead build a tool on top of that.

A Data Strategy to Unify Information through Standards

The next set of requirements is around understanding people’s long-term commitments, both historically and in the future. Once more, the optimist in you was hoping to find a single centralized tool everyone uses to plan their schedule that would give you all the information in an easy to understand way. Reality, as ever, is very messy. There is no single way that people manage their
The AI-Powered Workplace

139

availability and commitments. Project teams tend to pick a tool for a specific project, and there is a lot of heterogeneity across teams and across projects.

The challenge this creates with data management is that data is lost as it is spread across a variety of different tools, and there is no overall unifying approach to it. At the same time, it is important that teams are empowered to choose the best tools they want for specific projects rather than being forced to a single tool that may not always be the best fit. Do you force a single tool and eventually get the data you need but risk upsetting the teams, or do you resign to the fact that you will not have access to this data?

This is where standards can play a role. You don’t have to impose a top-down solution; you can simply mandate that certain capabilities should be met by any tool that project teams pick. For example, in the case of planning and resource allocation data, a standard could stipulate that any tool used to capture where time is spent is able to export its data in a format that provides the informa- tion the organization needs to plan. This means you would have to define a format (even something as simple as an Excel spreadsheet with specific fields can do the job) and provide teams with guidelines and ways to check that their chosen tool would conform with and output to that standard.

Introducing standards across a variety of activities allows teams to still move with a certain degree of autonomy, creates a useful discussion around what should and shouldn’t be included in the standard, and has the effect of getting everyone considering the impact of storing data in specific types of tools.

■ There is value is being able to change tooling and meet the specific needs of a project
or team, but there is also value in consistency. Standards provide a way to navigate those two aspects.

A Data strategy to Iteratively Improve Tooling

The final requirement is around being able to develop the company’s capability to extract learnings from past projects in order to be able to inform future projects.

There is a trove of information about past projects, but it is all spread across different systems and there is no way to search across everything. In addition, there are different file formats and different conventions used within the files. We would need to collect it all together, place it in a search engine of some form, and build an interface that would allow us to search through it effec- tively. Furthermore, in order for this search to be efficient, we need to be able to search the documents using generic terms that are appropriately expanded
140 Chapter 10 | From Data to Understanding

to terms that may be related to what we need but that we are not necessarily aware of.

For example, if someone searches for projects that involved “metal-joining” the search engine should be able to expand that to “soldering,” “brazing,” and “flux,” since those are related terms. To achieve this, you would have to settle on a company-wide way of describing all these different things and then build tools that would classify documents appropriately so as to power search.

At this point everyone is starting to realize that this is not going to be a simple task, and they are baulking at the potential scope and cost of it. Once more you need to gather the various stakeholders and produce a convincing plan that will keep everyone on board. The key here is to break the problem down into manageable phases, with each step delivering value so that it can justify the next step of investment. In the best tradition of start-up culture, you need to think big, start small, and scale quickly.

You explain how you can start quickly by collecting data behind a standard search engine. It may not provide the full benefit of an NLP-powered engine, but it is a more manageable task that will deliver immediate value. At the same time, a cross-functional team is created to start developing appropriate models of the type of knowledge that a more powerful search engine can start manipulating. Finally, company guidelines can be developed to ensure that every project adds to their list of debriefing tasks the transfer of project knowledge into datastores that can be accessed by the search engine.

At a second phase you can start applying automation via the introduction of NLP to better understand the types of documents available and make search better for users. You will integrate this functionality with your collaboration environment so that people can run searches through natural language ques- tions. It is now starting to look like a fun project to do and one that is actually manageable. While challenges will always exist, there is less initial fear to get things started.

■ Data needs to be prepared in order for it to be useful, and projects need to be broken down into iterative steps that provide value at each phase. As with any new venture, take a think big, start small, and scale fast approach to problem solving.

From Data to Understanding
Ultimately, everything that we do around data is there to lead to understand- ing. Nobody should be proud of how many terabytes of information they have stored and how much computing power they are expending to analyze it.
The AI-Powered Workplace

141

There is no intrinsic value in that. There is only value in your capability to use data in a way that will provide you with insight and enable you to get things done.

In this chapter we looked at both the high-level purpose of data for data- driven and model-driven techniques, and discussed how increased use of data to power automated decision-making has to be accompanied by increased confidence in data governance.

Finally, we looked at a few different strategies you can employ in order to solve real-world problems. Deciding to tackle a single problem and using that to inform your wider organizational strategy is a great way to get started. Learnings will always vary from team to team, and it is important to gain prac- tical experience. These bottom-up learnings tied to top-down willingness to invest can provide a winning approach.

Once a first problem is successfully tackled, the whole organization gains more confidence and the process can be repeated until it becomes part of standard procedure rather than a one-off experiment. It is only then that real transformation happens, when the process is embedded in the culture and is no longer a novelty activity.
C H A P T E R
11

Defining an AI
Strategy

In 2017 at the annual Google I/O conference, Sundar Pichai, Google’s CEO, stood up on stage and announced that the opportunities afforded by AI were such that Google was moving from a mobile-first to an AI-first strategy. That statement really brought the message home to many about the importance of AI and the need to formulate an AI strategy. Google, the technology behe- moth, was turning into a full-blown AI company. They had alluded to it in the past; everyone knew that Google was heavily invested in AI and using it exten- sively, but this was different. It left no space for doubt. AI-first.

If we stop for a second and analyze it, what does the move from mobile-first to AI-first really mean though? Well, the mobile-first strategy was one that recognized that the primary way people access digital services is through mobile devices. As such, any product Google produced needed to work on mobiles, first and foremost. A desktop-only product was simply not an option. Google’s hypothesis behind the mobile-first strategy was that if it provided the smoothest and fastest mobile experience, users would flock to their products. The challenge was that mobile-first development at the time required a focused effort and presented technical challenges. It needed sup- port from the top, as it created additional product development resource demands. That Google strategy soon became the norm across technology companies. These days it is not a strategy to say that you are mobile-first.

NEXT CHAPTER

Androgamer

Search This Blog

From Data to Understanding

Labels

Comments

Post a Comment

Popular posts from this blog

What is BLACK WINDOWS 10 V2 windows based penetration testing os,and what are its features.Download link inside

How to Bypass root detection in banking apps ? Google pay, Phonepy, Bhim and more

Mechatronics notes btech. GATE Notes.