SV013: TRANSCRIBING THE WORLD THROUGH ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING

W/ SAM LIANG

31 October 2019

On today’s show, we interview Sam Liang, the Founder and CEO of Otter.ai, a highly disruptive collaboration and productivity tool. Otter.ai has won many of the most prestigious awards in tech given by companies such as Google, FastCompany, PcMag, and more. Before founding his startup, Sam worked at Google where he led the location-based services unit, and won an award for the “Bluedot Mobile Location System.” Sam received a PhD at Stanford for Internet Distributed Systems, working under the professor who is known for being the first investor in Google.

SUBSCRIBE

IN THIS EPISODE, YOU’LL LEARN:

  • How big of a market there is for a program for translating and storing conversations
  • How Otter works and how it continually keeps improving using Artificial Intelligence
  • What the next few years looks like for use of data and how programs will get more and more accurate in time
  • How a translation tool will revolutionize what we know about ourselves

We would like to give a special thanks to Rick Quan who made the introduction to the Sam Liang, which allowed this amazing interview to happen.

HELP US OUT!

Help us reach new listeners by leaving us a rating and review! It takes less than 30 seconds and really helps our show grow, which allows us to bring on even better guests for you all! Thank you – we really appreciate it!

Tweet your comments about this episode directly to Shawn Flynn and the rest of The Investor’s Podcast Community using #TIPSiliconValley.

BOOKS AND RESOURCES

CONNECT WITH SHAWN

CONNECT WITH SAM

TRANSCRIPT

Disclaimer: The transcript that follows has been generated using artificial intelligence. We strive to be as accurate as possible, but minor errors and slightly off timestamps may be present due to platform differences.

Shawn Flynn  00:02

On today’s show, we interview Sam Liang, who is the Founder and CEO of Otter.ai, a highly disruptive collaboration and productivity tool that has won many of the most prestigious awards in tech, given by companies such as Google, Fast Company, PC Mag, and more. We will talk about how Otter works and how does it continually keep improving using artificial intelligence? What does the next few years look like for the use of data and how programs will get more and more accurate in time, and how a translation tool will revolutionize what we know about ourselves by the changes in our words, and our speech patterns, and much more on today’s episode. Enjoy!

Intro  00:45

You are listening to Silicon Valley by The Investor’s Podcast where your host, Shawn Flynn, interviews famous entrepreneurs and business leaders in tech. Discover how money is made in Silicon Valley and where tech is going before it gets there.

Shawn Flynn  01:08

Sam, thank you for taking the time today to be on our show.

Sam Liang  01:11

Thank you, Shawn, for having me here.

Read More

Shawn Flynn  01:13

Now, Otter has amazed me. I was looking up your company online and I saw it’s won several awards from Fast Company for Best New App of 2018, PC Magazine’s Best 100 Apps of 2018. There’s a ton of awards that you guys keep winning.

Sam Liang  01:29

Yeah, thank you for mentioning those awards. Actually, a week ago today, on Monday, Apple App Store featured us as “App of the Day” on the US market.  I think that’s one of the biggest awards you can get from the Apple App Store. It’s definitely gaining a lot of momentum. The app itself is only one year old. We released the app, the beta version in February 2018. Before that, we have been working on this for more than two years. So the company started in early 2016. We’ve been working really hard in the background. The AI technologies to do speech recognition with high accuracy *inaudible*. All of this started, you know, a few years back when we saw the need to take meeting notes automatically, to capture conversations, which billions of people do every day, all the time. An average person may talk 800 million words in their lifetime. So talking is the major communication. People communicate with each other, right? So most of the conversations are actually lost in the air. So we see a huge need and that’s why we started working on this.

Shawn Flynn  02:52

Can you even go back further before Otter, to your journey from Mainland China to Silicon Valley?

Sam Liang  03:01

I was born in China in Beijing. I grew up there. I went to Beijing University, studying computer science. After graduation, I came to the United States. At first I applied to Stanford, which is my dream school for graduate school, but they didn’t think I was good enough. So I went to the University of Arizona in Tucson, which actually I really like. I spent two years there studying computer science. And after that, I went to Atlanta briefly, but then I started working as a summer intern in Silicon Graphics in Mountain View, as an intern. After the summer intern, they said, “Sam, you did a good job. Do you want to stay?” I said, “Yeah, sure. I’d love to stay.” So that was 1994. Silicon Graphics was like Google at that time, you know, a dream job for me. So that’s why I stayed in Silicon Valley since then. So that was 25 years ago.

Sam Liang  04:00

Interestingly later, I joined Google well, a few years later, I finally got accepted by Stanford, pursuing my PhD there. I was focusing on distributed systems. I was lucky to actually join the research group of this professor whose name is David Sheraton. For people who are not familiar with him, he is actually the professor who wrote the very first check of $100,000 to Larry Page and Sergey to start Google. That’s the very first check, Larry and Sergey, as they were maybe 22 years old or maybe 24. At that time, to receive that much money. It’s not a big check for them today, but at that time it’s a big deal. So David helped them build the initial Google strategy. And then a few years later, Google went IPO. And I was lucky to work with David, my thesis on large scale distributed systems. Later on, I joined Google. I was the leader of Google Map location service, 2006 to 2010. We actually built the initial blue dot system people use on their mobile phone there, a blue dot that shows your current location and they help you navigate. It was really fun. And Steve Jobs actually back in 2007, he personally demonstrated that feature on the first iPhone to show your location and navigate. It is not a big, big deal today because everybody takes it for granted. But back in 2007, it was a huge deal.

Sam Liang  05:33

But I always wanted to do a startup. Finally, I decided to quit Google in 2010. I started my first startup in Palo Alto. And the first year actually was really hard. I had no funding, no salary. I was working on Android coding myself and ended up back end coding. A friend of mine was helping me part-time and he later became our co-founder. So that company went pretty well. After a few years, the company was successfully acquired by Alibaba. And I worked with Alibaba for a couple years and then decided to quit again and pursue something even more crazy. That’s the origin of Otter. Again, as I mentioned, you know, billions of people in the world, they talk every day for several hours to communicate, they talk in person, on the phone, in media conferences, but most of the conversations were actually lost, which is a big waste of time.

Shawn Flynn  06:30

My first encounter with Otter was at an event where it was being used to transcribe the speaker on stage and real time for the audience. You’d mentioned just note taking, but walk me through how it’s being used right now.

Sam Liang  06:45

Initially, that wasn’t the use case we planned to focus on. You know, as I mentioned, I’m really fascinated by conversation in general. People talk so much they have to communicate. There’s no easy way for them to track that. There’s a lot of conversation that happens spontaneously, you don’t necessarily schedule a meeting, you know, if you can run into somebody in the hallway or in a Starbucks, in a restaurant, you start talking. And all that information is very precious. We want to capture all of that as much as possible. It’s a part of your memory, a part of your life. And so they asked you users who found new use cases we didn’t anticipate before.

And one of them is actually using it in a large event. The first major event we did was actually TechCrunch Disrupt in San Francisco, which is a very big event with the TechCrunch people while their reporters and writers, some of them are already using Otter to help with transcribing their own interviews. They found it really useful. So later, they said “Wow, this is so cool and and why don’t we use it for our conference,” which is actually really useful because for several purposes, one is for accessibility. You know, there are people who are hard of hearing or they have some auditory processing difficulty. Presenting the speech in written form is very helpful for them. It’s also useful for an international audience whose native language is not English. They may not be able to quickly understand spoken English, but they can understand the written words better. Another big benefit they found is search. You can search for keywords like self-driving cars, Dropbox, privacy, so you can quickly find who talked about privacy, who talks about self-driving cars, how many times they were mentioned, for example. Otter also showed the word cloud for each speech so you can quickly get the gist of each speech quickly. So yeah, users found a lot of creative use cases out of Otter, many of them we didn’t anticipate.

Shawn Flynn  09:12

Tell us a little bit about how Otter works. So, I mean, my question is, when I first started using Otter, I would have to highlight a few lines, and then type my name. But now when I upload an audio file, it automatically knows who I am. Can you talk about what’s going on in the back end of this?

Sam Liang  09:32

User interface is by design very simple. We hope it’s intuitive, although there is still a lot of a room for us to improve. However, the AI technologies behind it are extremely sophisticated. English is highly… Well, any language. It’s very difficult to transcribe or do speech recognition because of the way people talk is very different between people in different rooms with different acoustic environments, different reverberation. So when you’re talking in a car, in Starbucks, in a restaurant and there is this huge amount of background noise. So we build this very sophisticated deep-learning system that built the model, built the algorithm, and we use millions of hours of audio data to train the model so that it’s able to handle all kinds of variations in terms of talking speed, the volume, reverberation, accent. For me, although I’m living in Silicon Valley for 25 years, I still can’t fully correct my own accents and I’m trying to improve that. But Otter is able to handle most of this correctly, it’s not perfect. Nobody’s perfect today yet, but Otter provides one of the best accuracies in the world, which is actually confirmed by Zoom’s own benchmark testing system.

Sam Liang  11:13

The other part you mentioned, the voice recognition, it can recognize each person’s voiceprint. You know, when you talk versus when Sam talks, it’s actually able to separate the two speakers based on a lot of features. We look at the frequency of your pronunciation and the way you talk. It’s a lot of features we’re using to separate a speaker so that once you label about a minute of each person’s speech, we create a voiceprint data structure in our system to remember that person’s voice. Later on, when we hear the same person, we can identify that person. This is similar, conceptually similar to facial recognition. When you upload a photo to Google Photo, for example, they can ask you, “Hey, who is this?” You can say this is Shawn and then they remember it. So next time they see the picture of the same person, they can guess, oh, this is Shawn. So, we do similar things with the voice. So that in a meeting, once you label the speaker, once it’s able to identify the speeches from that speaker later on, which is actually extremely useful when you need to search who set some concerns about privacy, who talked about this self-driving car. You can analyze the speeches from each speaker and you can identify the concerns from each speaker, the topics each person would like to talk about. So analytics is another huge benefit.

Shawn Flynn  12:58

It sounds like a massive amount of computation power has to be used for this. What’s stopping Microsoft or a huge corporation from competing with you in this realm?

Sam Liang  13:09

It’s a good question. That’s a question every investor asked us. When we started this three years ago, people asked me, “Hey, isn’t Alexa already doing that? Isn’t Siri doing that? Isn’t Google doing that? Why are you doing this again? Are you able to survive?” It’s true that a lot of players are doing speech recognition, but traditionally, their focus was different. When you use Siri, firstly, you need to say a hard word like, “Hey, Siri” to wake up the robot. Then you ask a simple question like, “What’s the weather tomorrow?” or you issue a command like “set alarm at 3pm.” It is a chatbot in the sense that a human being is able to talk to a robot and the robot either answers a short question or executes a short command. This is actually not very natural for most human beings. In the last, I would say, thousands of years people talk to each other. They don’t talk to robots because a robot didn’t exist before. So it’s very natural for people to talk to each other, to have a natural conversation. For that use case, actually, nobody was focusing on that. When we started, we decided to focus on this case, because that’s what people spend most time, talking to each other, rather than talking to a robot. So our technology is built ground up to focus on human-to-human conversation.

Shawn Flynn  14:48

The data sets are that… what’s different?

Sam Liang  14:51

The data sets are different. For Google, for example, for many years, they focus on voice search, but their training data is mostly short search questions. So they do really well on that, you know, we don’t want to compete with Google on that. However, if you look at the Google system, they’re not actually focusing on the use case we are working on. They may have some people working on this. But Otter actually is the leader in this space at this moment. You know, that’s actually interesting. We actually have… We know a lot of people inside Google are using Otter. John Doerr, who is a board member of Google for 20 years, he is using Otter himself for some of his meetings that he even told his daughter to use Otter. So that’s a huge endorsement from a Google board member.

Shawn Flynn  15:47

Does that mean because you were able to carve out this niche for yourself that you have this huge moat around you that would take someone else a long time to catch up? Or is it because of the data you’ve collected in the last few years of people using your app that your data sets are so much better than someone else’s? What’s stopping anyone else?

Sam Liang  16:09

I wouldn’t say what would stop them, but with just how fast that they can move. There are a lot of algorithms that are open but there are a lot of trade secrets and a lot of engineering trade off. A lot of things you have to do, even if you know the algorithm published in academia, you know, you can get something right. But to get, you know, 95% accuracy, it’s very difficult. For us, we have a headstart in this space. We started early and ahead of most people, and we have this product release in the market so when millions of people use it, they contribute more data to us. Of course, you know, everything is encrypted, the machine may crunch the data and learn from users’ corrections as well. We allow users to correct errors in order. So every correction, keep the machine more *inaudible* on how to transcribe more accurately. So we have this virtuous cycle when more people use it, more feedback comes to our engine and the engine running, you know, our own algorithm to take advantage of that feedback to improve and fine tune the model, to make it even more accurate. So far, we have done near 10 million meetings already. So that’s a huge amount of data. I don’t know, you know, how many people have that quantity?

Shawn Flynn  17:43

A little bit ago, you mentioned that when you first started the company, investors were telling you, why isn’t Microsoft or Google competing and enter this market? How difficult was it to get initial funding for your company? What was that first year like?

Sam Liang  17:59

It did take some time to convince people, especially before you have the prototype ready. It is still a paper ware rather than a product. For me, it’s a little bit easier this time because I did a startup before. I had a good track record already. And also, I was lucky to have met Tim Draper. I met him first back in 2010 when I was doing my first startup. Actually, I had a lot of trouble raising funding for that startup because, you know, I just quit Google. As an engineer, I had no business background and no track record. And when I told people that I was going to build this mobile startup, people get scared because they said, “Oh, you were working on location data. It is too creepy, and it’s intrusive. Nobody will use your system because they don’t want to be tracked.” Until actually I met Tim Draper.

 When I showed him a demo, he got really excited. He said, “Wow, this is what I’ve been looking for a long time. I want to create a lifelock for myself. I want to trace my own journey.” He got, you know, really interested, gave me a EIR position in DFJ, entrepreneur-in-residence. I wasn’t paid. I had no salary there but they gave me a really nice office there. When I started to work on Otter, I went to Tim and told him about our ideas. And he said, “Yeah, okay, this is you know, I want to take meeting notes automatically. There’s no product in the market.” He is very supportive. I really appreciate that. People know him as sort of a crazy person in some sense. You know, for myself, I have a similar personality to do something crazy is that you know, eventually, Otter can capture all the wise conversation in the world. My whole life could be recorded and searchable. Of course, I’m not going to publish everything on Facebook and Twitter, but just for myself. I’d like to be able to search everything I heard.

Shawn Flynn  20:13

Before we started this interview, I’d asked you some of the use cases and you’d mentioned universities. And could you talk about that use case?

Sam Liang  20:22

Lots of students from Berkeley, Stanford… Students downloaded the app and use it in lectures that use it in for graduate students, mostly for research meetings. We also got a lot of emails and phone calls from the Department of Accessibility from a lot of universities, which we actually didn’t anticipate when we started the company. It turned out that for a school like UCLA, they have hundreds of students who are either hard of hearing or they have some other learning differences that need the school to provide note taking services. UCLA actually told us they spent more than $150,000 to hire note takers for students who are in need. So when they discovered Otter, they’re really excited. They want to bring AI into education. They found their students, again, who were hard of hearing, they would be the first to benefit from this new technology.

Shawn Flynn  21:33

And you’ve talked about AI quite a bit in this interview. Could you go on a little bit more detail of what AI means to you and that definition?

Sam Liang  21:42

Right, this is the generic question in what is AI, artificial intelligence. It is sort of a misnomer, in the sense that AI today, you know, 10 years from now, people won’t consider AI anymore because it’s a solved problem. I think *inaudible* is IBM Deep Blue beat the human chess master. At that time, people considered that as AI, because that’s the first time a machine could beat the human champion. But today that’s already taken for granted because everybody understands the algorithm already. So they don’t consider it AI anymore. 

So AI is an interesting term, it’s only when an algorithm solves a mysterious problem that people consider as AI. Of course, deep learning is a new framework that helps us a lot and helps society in general. With deep learning, we are able to create a speech recognition engine that is way more accurate than a few years ago. If you do this company five years ago, the accuracy wouldn’t be good enough for usual use cases, there will be too many errors that people cannot tolerate. Today, we are able to achieve very high accuracy. Again, it is not hundred percent perfect yet, but people find it usable.

Shawn Flynn  23:14

So you’d said five years ago with computation power at that time, we wouldn’t be accurate enough. What do you see the future five years from now?

Sam Liang  23:24

Five years from now I think it will be near perfect. The reason is this, for any human being, the knowledge you have is limited. However, when you use this huge engine running in the cloud, potentially you can have millions or billions of machines working together. It is using the entire knowledge, history of human beings to understand every single sentence it hears. So potentially it will be able to surpass… I don’t know exactly when it will happen. But I’m very sure it will happen. Of course, the difficult part today, we haven’t solved is understanding. There’s still a long way to go for a machine to fully understand human speech because the speeches are highly ambiguous. It’s highly contextual. 

When you talk to a new person, how much do you understand that domain? How much do you understand that person’s background? And now moving forward, and the more people use more background information, Otter will *inaudible*. Then it will understand each speaker better. The reason is, I know this, Shawn, the engine active can look up your LinkedIn page, can look up your Facebook page, it knows what you talked about recently. So when it hears certain words from you, it will match that with the background information and can guess better what you’re talking about, can guess better what you want to say. So in the future, it could be scary even before you open your mouth, the machine can almost predict what you’re going to say, based on your history.

Shawn Flynn  25:13

Everything that I’ve been typing on Facebook and LinkedIn all these years, are these going to be combined and will all systems be able to have access to it or will they be able to scrape this page and that page for the data? Or will each site location really kind of clamped down on their data?

Sam Liang  25:33

Yeah, privacy is a big topic today, right? You know, Facebook, some sort of trouble this day. For us, it’s, we provide this capability, but it’s up to the user to control who he wants to share the information with. I see it as almost like your fitness app, track my steps. Track my lot of vital data. It’s not public, it’s safe for myself. I could share this with my doctor, some of them, or all of them, the doctor can detect cancer earlier for me. So that can leave a few more years. So Otter’s data is similar, the user has a full control, you know, of what can be recorded and what you know… it has to be erased, what cannot be recorded, and after recording, in a what can be shared with his colleague, what can be shared with the public. As long as the user has full control, I think this is a good thing. If the AI is good enough to listen to me and tell me the way you talk can be improved. You say too many, you know, Star Wars and all that. So, I can help me improve my speaking. I can improve my relationship with other people because it can tell me sometimes I respond to questions in a wrong way. Get people annoyed, in terms of coaching and mentoring. It just has too many benefits to ignore. Of course, you always want to find balance between privacy and data access, but I think it can be reconciled.

Shawn Flynn  27:16

So with that data balance in the future, say, we’re at a meeting for the first time, we’re going to use Otter.ai. Would I give Otter permission to scrape all my data from LinkedIn to kind of get a better idea of each word I’m saying or would it be automatic because I have an Otter account? What does the future kind of look like for the engine behind Otter?

Sam Liang  27:41

Eventually, I think it will all be done automatically. It takes a few years for people to get used to this type of situation. But if you look at history, I firmly believe that this will happen. I don’t know how many more years people will be fully comfortable. But if you look back 30 years ago, there was no Facebook, there was no Twitter. But today people are sharing so many things with each other. It’s a more open society. Actually, I say that’s a good thing, although there are some problems to solve. But you know, people understand each other better because of all the social networks and more things are being recorded. Of course, you know, GDPR, and all that, you have to control that, you have to… the users are in control, but I think people can live a better life when more data is logged and analyzed. As I mentioned, it’s almost like your health fitness app. When more data is locked in analyzed, the doctor can diagnose your problem with your health, help you improve your health better. The same thing with the voice, it’s actually helped with your professional life. It helps with your mental life by listening to the conversation, it can almost detect depression, when you’re speaking last week, something bad is happening, right? Your mood is not good.

Shawn Flynn  29:11

What is your dream for Otter for what it will become?

Sam Liang  29:15

My dream is to become pervasive, to become universal. I think eventually, for the conversations can be captured, selectively shared, can be searched, can be analyzed, as I mentioned, right? In a world where the topics I talked about, what were the things other people told me, and helped me improve my own life, helped corporations improve their productivity, improve their collaboration. And there’s actually a lot of studies that show that in enterprises, people spend 30% or even 40% of their time in all kinds of meetings. It’s a huge number, it’s a huge investment. If you pay somebody a hundred thousand dollars a year, you’re actually paying them $30,000 to go to meetings. Then you want to know, how effective is that investment? What’s the return on that investment? Are those meetings effective? Did people generate the right decision? You know, did people just waste their time talking about random things?

Shawn Flynn  30:24

Sam, is there anything that we didn’t cover that you wish people at home knew about?

Sam Liang  30:30

Anyone just… I would encourage you to go ahead and download Otter from Apple App Store, Google Play, or use it on your laptop by going to otter.ai to sign up and give it a try. And you know, use it and send us some feedback. We love to hear from our users on how we should improve.

Shawn Flynn  30:53

Great, so we’ll have those links in the show notes. And I also want to thank Rick Quan, who actually made the introduction to Sam Liang, that was what facilitated this whole interview, and he has an amazing documentary out there on *Edley. Sam, thank you again for your time today, and we look forward to having you back on the show in the future.

Sam Liang  31:13

Thank you, Shawn. I really enjoyed this. Thank you.

Outro 31:15

Thank you for listening to TIP. To access our show notes, courses, or forums, go to theinvestorspodcast.com. This show is for entertainment purposes only. Before making any decisions, consult a professional. This show is copyrighted by The Investor’s Podcast Network. Written permissions must be granted before syndication or rebroadcasting.

PROMOTIONS

Check out our latest offer for all The Investor’s Podcast Network listeners!

SV Promotions

We Study Markets