Skip to main content

Core AI Capabilities

             PREVIOUS CHAPTER

Chapter 5 | Core AI Capabilities



In this chapter we will focus on three broad classes of capabilities that repre- sent the most frequent types we encounter in a work environment. They are also the most likely to provide immediate benefits in any work environment:

The ability to understand and manipulate language (both voice and text) and generate language

The ability to manipulate images, classify them, and iden- tify specific objects in images

The ability to combine organizational-specific knowledge and data to  create  organizational-specific capabilities— our very own superpowers that can be incredibly hard for others to replicate

The aim of the chapter is to give you a high-level understanding of how these capabilities work and examples of their application, so as to  demystify the processes  and allow you to  more  clearly consider how you could exploit them in your own work environment.


Language
Language is a critical capability that organizations should be looking to exploit as much as possible. As knowledge workers  our currency, in many ways, is words. Whatever  the end result of the activity of any office, the way to col- laborate with colleagues and share ideas is through language.

Language has some fascinating idiosyncrasies and calls from the outset  for a rich and interdisciplinary approach. It would be impossible to  cover all the challenges here, but I think it is useful to consider a few so as to better  com- prehend the scale of the task and realize what an incredible amount of prog- ress has taken place.

To start with, there are obviously multiple languages to deal with. Luckily, dif- ferent  languages present  several similar characteristics,  which means that techniques developed to handle one language can often be applied to others, with the main caveat being the availability of large enough data in the language we are looking to analyze.1 Language, however, is not static. The English spo- ken in the UK today is very different from that  of past centuries, and the English spoken in the United States or Australia is sufficiently different from that of the UK that different language models and datasets may be required. Language also morphs as it moves from one domain to another. If two experts in civil engineering listen in on the conversation of two experts in aerospace


1 While languages do have some innately similar characteristics, we should be careful to not overgeneralize. A more nuanced statement  would be to say that languages with simi- lar heritage share similar characteristics.
The AI-Powered Workplace     61



engineering, they may understand most of the individual words but the overall meaning will be lost to them. Words  take on new meanings, acronyms are introduced, and quite often, especially in spoken language, slang is used that only makes sense in very specific contexts and time periods. I am sure that if I asked my dad to “Slack me” he would have a very puzzled look, but if I said, “Skype me” he would understand and likely reply with “Why don’t we just use FaceTime, shall we?”

Then there  is the issue of understanding what we say when we speak and transcribing that to text. Our accent, the acoustics of the space, whether we have a cold or not, background noise, or other  people talking at the same time all come  into  play to  influence what sounds will reach the  machine, which needs to then isolate the specific data it cares about and transform that into  words.  Once  more,  it’s not  just about  a faithful transcription  of the sounds into words. We structure things differently when we speak. We add “ums” and “ahs” and stop and start in strange ways that somehow all make sense to us but are not the same way we write.

As you can see, the challenges are considerable, and it is amazing that we now have readily available AI tools that allow us to recognize speech, transcribe that to text, understand its meaning, and even generate language. We haven’t solved all the problems, but we’ve solved enough of them to make these tools viable for use in the development of AI-powered applications.

We  briefly consider the  implications of all this in the  next  section  across speech recognition, natural language processing (NLP), translation, and natu- ral language generation.


Speech Recognition
Speech recognition deals with our ability to  transform  the sounds that  we produce when we speak to text.  It is often also referred  to as ASR, which stands for automatic speech recognition. Quite easily an entire field of study on its own, it combines a breathtaking set of technologies.

An ASR system starts by picking up the sound of our voice through a micro- phone. That signal gets cleaned and processed in the hope of isolating only those frequencies that represent  a human voice. Those analogue continuous sound waves are then sampled and translated  into what are referred  to  as speech frames (a couple of dozen milliseconds of sampled waveform informa- tion). Speech frames are then used to help us understand what phonemes the user has uttered. Phonemes are units of sound that combine to give us words and are used to differentiate between  words—the  linguist’s equivalent to a grammatical syllable.2 Linguists define the specific phonemes of each language


2 For example, the word “Five” would be represented  with three phonemes: “F-ay-v.”
62   Chapter 5 | Core AI Capabilities



and how they combine into words; that knowledge is then used by ASR sys- tems. This information is then further combined with a pronunciation model and a language model, nowadays largely based on deep learning, to produce the final text.

Speech recognition  systems, especially after  the  huge enhancements  that improved  neural  network   algorithms  introduced,  provide  an  impressive amount of accuracy (all major technology companies report  human level or better  accuracy with error  rates close to or below 5%). That does not mean, however, that we can assume that they will be able to tackle any situation with ease. The specific context  needs to  be taken into account, and a realistic investigation needs to happen into the viability of using speech recognition in order to solve a given problem. You probably already noticed how voice assis- tants are not that effective in crowded rooms with lots of other people speak- ing, whereas they perform much more reliably in a car where outside sounds are cut out.

The domain of discourse is also very important. Here is a very simple experi- ment you can run on your own to understand how it can affect speech recog- nition. Call up whatever voice assistant you have on your smartphone, be it Siri, Cortana, or the Google Assistant. First try telling them something that might be said in your work setting using domain-specific terminology, and then try an everyday phrase that is about dealing with more general life tasks. Look at the transcription of the text to see how accurate each got it.

I used the following work-related sentence:

“The high-level objective for Task 1 is to produce a chatbot that is able to assist a user to search through multiple document repositories  that are accessed through a federated search service.”

This is a relatively friendly test. There are some domain specific keywords, but they are not too arcane. I am sure you, the reader, will have no difficulty with the individual words although you may have some questions about the overall meaning; for example, what exactly is a federated search service?

Google Assistant came back with:

“The high-level objectives for task wants to produce a chat but the table to sister user to search through multiple  document repositories  access with federated search service.”

Siri gave me:

“The high-level objective for task one is to produce a chalkboard that is able to sister user to search through multiple document repository other access through federated search service.”
The AI-Powered Workplace     63



Those are admirable efforts, but not very usable. However, if I try the following sentence:

“Remind me to drop off the kids at school then go collect groceries, pass by the pharmacy, and then meet Julia for late breakfast.”

Siri gets it word for word correct and so does Google Assistant. I didn’t even have to  enunciate  too  carefully, something that  I  did do  in the  previous example.

Clearly, they work well for exactly what they were designed: to help us handle everyday life, rather than transcribe domain specific information. It is no sur- prise that one of the leading transcription software companies, Nuance, pro- vides different software solutions for different industries such as legal, professional, and law enforcement. Each solution advertises the fact that it has been trained for that industry’s specific vocabulary, precisely because that is a necessary precondition for effective operation in that industry.

In summary, although speech recognition has come a long way, it is important to keep a realistic view of where it can currently help, especially in an office setting. It can be extremely effective and less onerous to train if we want to use voice to issue straightforward commands or directions to a machine. In these cases, we are only uttering smaller phrases with a specific intent, such as “Open Microsoft Word” or “Call my HR contact.” It becomes more chal- lenging  if we are trying to use it to transcribe complex phrases with domain specific (and especially acronym heavy) content.


Natural Language Processing
With speech recognition we go from sound to text. Once we do have text, how do we understand what we can do with it, though? This is where NLP comes into play. Let’s look at some of the key stages to both understand what is possible and as a way to inspire ideas of how you can use it in your own work environment.


Analysis and Entity Extraction

The first stage is, typically, the syntactic analysis of the text we want to under- stand and something called entity extraction.  Consider just a simple phrase such as:

“This is a book on the use of artificial intelligence in the office. It’s published by Apress, part of Springer Nature.”
64   Chapter 5 | Core AI Capabilities



To start with, we need to break up the text into its individual components; understand what constitutes  punctuation and what does not, and how that affects the sentence structure.

Using Google’s NLP demo3  we get an analysis such as the one in Figure 5-1.




Figure 5-1.   Syntax analysis of a phrase using Google’s NLP API

You can see there  is quite a bit going on. The NLP system has been able to successfully identify all the different words, including where we’ve used apos- trophes  such as “it’s.” It is also identifying nouns, verbs, punctuation, adjec- tives, and more.

Entity extraction is able to tell us that book, Apress, Springer Nature, and artif i- cial intelligence are all salient entities in this piece of text and for some, such as “Springer Nature”  and “Apress,” it is able to say that they are organizations and provide links to their websites.

With just this information we can start thinking of a search powered by NLP that can be so much more effective than a “normal” search that only com- pares  strings without  any contextual  information—a search  that  will, for example, be able to distinguish between when a specific organization is men- tioned, such as Apple, instead of simply the fruit apple. Imagine being able to search through your document  store  and then  filter against mentions of a specific company, product, or coworker  names just the same way you filter against different brands on Amazon.com, without having had to painstakingly annotate those documents up front. The NLP system can do the heavy lifting for us.




3 https://cloud.google.com/natural-language/#natural-language-api-demo.
The AI-Powered Workplace     65


Classification

What is the document about? Is it a sales report, meeting notes, or a pitch to win a new contract? Are the sentiments expressed within a document posi- tive, negative, or neutral? Is the content sensitive or potentially offensive?

Classification, and in particular classification that is relevant to your specific needs, is one of the most frequent applications of NLP.

NLP tools have become particularly adept at this, and the good news is that it is already possible to train your own organization-specific classifiers with minimal specialized expertise. This is possible because you can base your clas- sifier on existing language models (that have been prepared  on much larger datasets)  and specialize them  with either  rule-based classification or  with data-driven approaches that work as a layer on top of the existing models.


Intent Extraction

When we are using language to communicate, especially when we are asking someone  to  do something for us, our words can be mapped to  a specific intent. For example, if I say:

“Could you please open the window?”

The intent is quite clear. I am asking someone to open the window for me. However, I could also say:
“It’s hot; can you let some air come through?”

Although I didn’t explicitly say “open the window,” the intent is the same.

The job of intent extraction is to help us understand what action is conveyed in the words. As you can imagine, it is particularly important in conversational engines that power chatbots. They need to be able to map all the myriad ways we, as humans, can say something to a specific response or action. In addition, they need to do that while taking contextual information into consideration. Consider the following dialog.

Human: “I’d like two pizzas, a Coke, and some garlic bread.”
Bot: “Thanks, what type of pizzas would you like?” Human: “A pepperoni  pizza and a margherita. Oh,
make that Coke a Sprite.”
66   Chapter 5 | Core AI Capabilities



What to us is a very simple dialog is quite a challenge for a bot. It asked the user for the types of pizzas but it also got some information about a change in the order  of the drinks. It needs to understand that the phrase the user uttered  carried two intents, that the second intent was a reference  to the previous phrase about drinks, and it’s about changing the existing Coke to a Sprite!

Nowadays there is a wide range of tooling to help organizations develop appli- cations that can handle such conversations, and problems like the preceding one can be solved in well-defined domains. The key is to clearly weigh where intent extraction and conversations would be most effective. It is a balancing act between the complexity of the NLP problem to be solved and the value the solution is going to generate.


Translation

We’ve probably all seen AI-powered translation at work. It is what makes pos- sible those links on Twitter, LinkedIn, and Facebook that say “See Translation.” It’s what powers the translation feature of Google Chrome that translates an entire web page.

According to Google Research,4 automated translation systems, under some circumstances, are approaching or even surpassing the translation quality that you would expect from human translators. It is important to take such claims with a healthy pinch of salt though. Those “circumstances” are important. If we are dealing with single words, short phrases, or web pages with small sec- tions and not too complex concepts, automated translation can do an impres- sively effective job. The more sophisticated the concepts and the more layered the text, however, the less effective the translation.

A recent contest  in South Korea pitching automated systems against profes- sionals translating text from Korean to English and vice-versa concluded that about 90% of the automatically translated text was “grammatically awkward” and not comparable with what a skilled translator would produce.5

As such, the same limitations that we discussed so far apply here. Generic automated  translation capabilities are impressive, but the more specific the domain the less efficient the translation model will be. If we are dealing with single words, simple commands, or small text, automated translation offers a viable avenue. For more  complex scenarios, organizations need to evaluate the tools available and consider where they can invest in their own tooling if commercially available translators are not enough.



4 https://ai.googleblog.com/search/label/Translate.
5 www.koreatimes.co.kr/www/tech/2017/02/133_224449.html.
The AI-Powered Workplace     67


Natural Language Generation
The mirror image of natural language processing is the automated generation of new text.

We are far more likely to digest information and understand its implications  if it is set in an appropriate narrative for us. We have all gone through that feel- ing of blanking out when presented with walls and walls of tabular data. Even with more pleasing graphs and charts, after a while it can feel like one blurs into the other. What we care about is the story that those tables and chart tell. Natural language generation (NLG) allows us to input structured, labeled data and get a natural language document that provides an appropriate narra- tive around that data as a result.

A particular strength of NLG is that it can produce multiple narratives from a single set of data, adapted or personalized to a specific situation. Take, for example, financial data. Analysts need to provide reports for all their different clients following the reporting of performance of a particular company or the release of data around a specific sector. The inputs, in this example, would be something like the annual company report  and the portfolio situation of a specific client. An NLG system can then produce a narrative that describes what happened and how it affects a specific portfolio.

There are several levels of analysis that the NLG system performs to get to a final  structured  document.  It needs  to  determine  the  relevant  input data points that should be mentioned in the generated document. For example, did the company make a profit or a loss? What are the biggest expenditures? Where did sales mostly come from? The NLG system then manipulates what can be imagined as a very complex template that provides rules around the structure of the overall document and the structure of individual phrases. The end result is a document that is not only grammatically correct but structured in a way that is comfortable and natural for us to read.

Another  example is weather  reporting. From a single weather  data set, a news organization can produce localized weather reports  for all its affiliates without requiring a writer to go through the data and come up with appropri- ate narratives.

Within organizations, NLG is increasingly being used to provide the narrative around how the company is reporting in a more efficient and impactful way than charts and purely numerical reports.  This can be particularly empower- ing for users who do not have the skills to do the necessary data analysis on their own.
68   Chapter 5 | Core AI Capabilities


Vision
Vision refers  to  a machine’s  ability to  process  visual data and interpret  it appropriately. It can range from something as “simple” as scanning a bar code to identifying objects within a photograph.

The advancement in the interpretation  of images, as we discussed in Chapter
3, is what opened the floodgates for more general applications of AI. Unlocking the ability to correctly interpret  an image enables so many applications, from autonomous driving to the ability to better  monitor and manage the growth of crops across large areas.

The toolsets  to  enable training of data-driven models (the  overwhelming majority being deep learning models) is potentially the most evolved across AI capabilities. This is a combination of the incredible amount of work that has gone into machine vision6 coupled with the suitability of deep learning archi- tectures  to handle raw image data.

There are powerful tools to label or annotate images that can then be used to  train models and, unsurprisingly, there  are several possibilities within a work environment.

Authentication and authorization: Face recognition can be used to identify people and provide access to work office spaces. It is not  without  its challenges, though. It can offer a more  seamless experience  and, under the right conditions, a more reliable security environment. However, it comes with risks, as companies will need to store biometric data.7










6 Just in terms of investment in autonomous driving, 2018 saw venture capitalists commit- ting 4.2 billion dollars—the  key technology developed there  is real-time AI-powered machine    vision:   www.axios.com/autonomous-vehicles-technology-investment-
7a6b40d3-c4d2-47dc-98e2-89f3120c6d40.html.
7 In August 2019, for the first time, a large database of biometric data was found exposed on the  open Web.  It contained fingerprints  and facial recognition data for  millions of people and was managed by a company named Suprema. One of the biggest implications is that while one can change their password, if the digital equivalent of their fingerprint is stolen, there  is no mechanism to replace it! www.forbes.com/sites/zakdoff- man/2019/08/14/new-data-breach-has-exposed-millions-of-fingerprint- and-facial-recognition-records-report/#4cef3ee046c6.
The AI-Powered Workplace     69



Fraud detection: In industries such as hospitality and retail machine vision can be used to detect when items are not properly processed  at point of sales systems. They can monitor employees or clients as they are passing objects over barcode readers.8

Asset monitoring and management. The analysis of images of physical assets  can reveal where  faults are  close to occurring and optimize the maintenance of workspaces.

Digitization and categorization of analog documents: We are a long way away from becoming entirely digital, and we have a swath of historical documentation  that  we still need to  deal with. Machine vision can be applied both to  categorize  documents  (e.g., identify receipts,  sales reports,  pay slips) and also digitize them  so  that  the information within them is immediately accessible.9

Just as with NLP, there  are powerful tools readily available to test  out ideas with machine vision. A great example I’ve encountered, that tells the story of how accessible tooling has become, is of an intern building a fully digitized system for monitoring parking availability  for the  entire  workforce  over  a single summer. They used the data coming from security cameras to figure out  what parking spaces where  available based on the  movement  of cars, exposed that information in the company intranet, and set up parking boards for everyone to see. This improved everyone’s experience of coming to work and required, all said, minimal effort.

As with everything else, care needs to be taken to ensure that the capability you think you have developed can translate  to  wider deployment. Machine vision is notorious for providing false positives or completely missing the tar- get. When dealing with life and death situations, such as autonomous driving, that is simply not acceptable. However, when the goal is to make an existing process more efficient (such as letting people know whether there is a park- ing space), some errors  can be tolerated.  Similarly,  if we are using images to








8 Beyond fraud detection,  vision is also a core  capability to  automate  the  entire  retail experience, as Amazon demonstrated with their automated grocery store, Amazon Go. The solution there relies heavily on cameras to track how clients interact with items in the store.
9 A great example of on-demand-digitization is a new feature in Microsoft Excel whereby you can point with your smartphone camera at tabular data in a printed document and have that converted to digital spreadsheet data.
70   Chapter 5 | Core AI Capabilities



detect  people in pictures  in order  to  better  classify a catalogue of media images, we are saving ourselves time and don’t mind some errors.  If we are using face detection technologies coupled with emotion recognition to deter- mine the  emotional state  of a workforce,10   we are definitely overstepping what the technology can usefully achieve and risk alienating users.


Custom Capabilities
Language and Vision are generic capabilities with wide applicability across dif- ferent domains. Exploiting them appropriately across your organization can give you a significant advantage. There is space to innovate in how you use them and where you apply them, but it will become increasingly harder to complete with others  on building better  NLP or vision systems. The effort required  will likely not  justify the  potential  benefits for  most  companies. Ultimately, we can expect powerful NLP and vision capabilities to become the minimum standard necessary, rather than a competitive differentiator.

Instead, an area where there is possibility for your organization to differenti- ate and create more of a moat around your competitive advantage is in creat- ing your own “custom” capabilities. These are ways you can represent, reason, and act in the world in a way that is specific to your organization because it is a result of models that you have devised and data that only you own. I like to think of these as your organizational superpowers. Just like hero superpow- ers, they are the things that separate you from the other superheroes.  Some heroes can see better, while some can jump higher or pack a mightier punch. Those are their super capabilities. The question is: what is your organization’s superpower when it comes to AI capabilities?

To develop a custom capability, you need to create the right circumstances. Just like the Hulk, Ironman, or Spiderman, you need to walk into a lab and mess around with the different ingredients to see what can come out of them.

For example, there  may be something specific in the  way you collect cus- tomer data that allows you to model and reason about the behavior of your clients in a way that others simply can’t. You may have developed a culture and put in a place a process that means your team provides structured  feedback in a consistent manner. This enables you to get a better  overall understanding of team well-being and what needs to change, leading to a happier and better performing workforce.



10 This is an application of vision that has been deployed in certain schools in China with the aim of classifying students based on six behavior categories, with the goal of identifying students  who were not sufficiently immersed in study. Such applications of technology should rightly raise alarms: https://www.theglobeandmail.com/world/article-in- china-classroom-cameras-scan-student-faces-for-emotion-stoking/.
The AI-Powered Workplace     71



Perhaps, just like Superman coming from a different planet, you are entering a new market and can bring a perspective to it and new capabilities, in terms of how a process can be automated, that incumbents have simply not considered or are too comfortable to care about. The way fintech start-ups are disrupt- ing traditional banks is a good example of this. They don’t carry any baggage and are approaching the problem from a technology-first perspective in a way that the incumbents find hard to achieve.

The crucial element is to recognize that what you are looking to develop is a capability: a way to  understand  and reason  about  a specific aspect  of the world. Starting from there you can then explore the techniques available and start combining them to get to a specific solution.


From Capabilities to Applications
AI capabilities are the  ways that  you can understand  and manipulate your environment. Core capabilities such as Language and Vision offer a wide array of opportunities to organizations. There are easy to access tools making the barrier to entry low. The challenge lies in identifying the most fruitful ways of using them, cognizant of their limitations. Ultimately, these core capabilities will become part of everyone’s toolkit. What is important is to grow the skills and experience to use them effectively now, in order to gain some first-mover advantage.

In addition, you can start thinking of what custom capabilities you can develop. These organizational “superpowers”  can be exclusively yours because they depend solely on how you exploit the innovation capability of your people and your understanding of the world (your knowledge and your data). The more  mature  AI-powered applications become, the  more  important  these custom  capabilities become,  as they  are  the  ones  that  will provide  true differentiation.
PA R T
II






The Applications
of AI in the
Workplace
C H A P T E R
6







The Digital
Workplace


Digital transformation.

Did the mere mention of this darling catch-all phrase of consultants generate a slight inner groan? I know it has that effect on me, and I am one of those consultants, working for consulting companies that invariably mention “digital transformation” on their web site’s home page.

There is nothing inherently wrong with the phrase itself. Deep down we all know that. To transform processes through the effective use of digital tech- nologies is a very sensible thing to do. It is something that every organization should always be doing. The reason the phrase produces dread is that it has been thrown around so much by earnest marketers of consulting services that it now carries a certain amount of baggage. Buzzwords like digital transforma- tion, and its cousin, digital strategy, conjure up images of armies of consul- tants producing lengthy reports about what one ought to do to improve their workplace, with little practical advice about how to go about achieving that.

For the purposes of this chapter and what is coming next in the book, I ask you to put all that baggage aside. In this chapter, we are going to discuss the digital workplace, what that means, and how an understanding of the digital workplace   forms   the   foundations   of  a  strategy   for   the   AI-powered workplace.



NEXT CHAPTER

Comments

Popular posts from this blog

What is BLACK WINDOWS 10 V2 windows based penetration testing os,and what are its features.Download link inside

                         Black Windows 10 V2

Mechatronics notes btech. GATE Notes.