Tag: speechmatics

Supporting the growth of emerging markets

India is considered to be one of the largest emerging markets across the globe. With an expected 7.2% growth for 2017 and 7.7% for 2018 as explained by Finance Minister, Arun Jaitley and the 2nd largest population in the world.

With 521,000,000 Hindi speakers as their primary language (as of 2016 report) we decided to build Hindi to support the growth of the Indian market. Not many companies have tackled Hindi as a language for processing speech-to-text due to the different punctuation used and the limited amount of data available.

Using our language and acoustic framework, we built Hindi in a matter of days, using minimal data sets. The build was possible due to our existing language knowledge of training 28 previous languages, use of machine learning and knowledge of automatic speech recognition technology.

You can try out our languages for free here.

Ian Firth, Speechmatics

 

Four cities and a lot of Sichuan  

An event with DIT is always an exciting prospect and the recent mission to China was no exception and truly delivered beyond expectation.

We visted Guangzho, Shenzen, Chengdu, Guiyang in the space of a week and it is amazing to see the pace of technology innovation in China and how they are pushing their 2020 innovation agenda.

The first day in Shenzen was a day dedicated to meeting Huawei with an interesting presentation from Huawei and their technology team, which proceeded with an interesting chat with the Huawei innovation team where we discussed the Speechmatics real-time ASR capabilities.

We then headed to Chengdu – the home of Sichuan food and hotpots! We spent the morning meeting several interesting companies and the afternoon at the UK-CHINA BIG DATA COLLABORATION SEMINAR. The big data seminar was opportunity to meet many more interesting Chinese companies. We then spent the evening with Mr & Mrs Li who treated us to local cuisine and shows which were astounding.

Guiyang was next on the agenda which is a fascinating city and is one of the core focus areas for Chinese innovation and they are in the midst of building a large research park. We attended the Big Data Expo event where we met with Chinese press and iFlyTek who showcased their real-time captioning system and education system and also discussed their translation device. We also met with some of the biggest Chinese companies Alibaba, Baidu and Tencent.

We experienced great warmth and humour throughout the duration of the visit and are excited for future prospects working with Chinese companies and can’t wait to visit again soon. DIT and CBBC were a great support, running a seamless event that encapsulated why the UK and China should be in collaboration.

We’re excited for London Technology Week 2017 next week with more DIT run events strengthening collaborations with the UK and the rest of the world.

Benedikt von Thüngen, Speechmatics

What does AI and machine learning actually mean?

I recently read an article on how language led to the Artificial Intelligence revolution and the evolution of machine learning and it got me thinking. To start it’s good to know and understand what we are talking about.

Wikipedia says ‘Artificial Intelligence (AI) is intelligence exhibited by machines. In computer science, the field of AI research defines itself as the study of “intelligent agents”: any device that perceives its environment and takes actions that maximize its chance of success at some goal’. This is a much harder goal to achieve than Machine Learning which is ’the subfield of computer science that, according to Arthur Samuel in 1959, gives “computers the ability to learn without being explicitly programmed”.’ There is much confusion about the perceived ‘buzz words’ of AI and machine learning as many companies say they use AI, whereas in practice they have only used machine learning, which is quite different and not an ‘intelligent agent’ as in the realm of AI.

Machine learning has transformed natural language processing (NLP), in fact the whole area of computational linguistics is that of applying machine learning to NLP. This is a different problem to whether AI needs NLP – it’s perfectly possible to contemplate an AI system that we don’t communicate with in a natural language, it could be a formal language, but natural communication with an AI is going to need natural language communication.

So, what’s the story of machine learning applied to speech recognition?

The article quotes Rico Malvar, distinguished engineer and chief scientist for Microsoft Research, “speech recognition was one of our first areas of research. We have 25-plus years of experience. In the early 90s, it actually didn’t work”. I felt it was worth commenting that this could be potentially misleading for the history of speech recognition. In the early 90s, speech recognition did work for a variety of specific commercial applications such as command and control or personal dictation such as Dragon Dictate.

However, in the 90’s there was an interesting dynamic of computing power and dataset size. In the DARPA evaluations we showed that we could build useful large vocabulary speech systems for a variety of natural speech tasks using both the standard hidden Markov models and using neural networks. Indeed, my team at the time pioneered the use of recurrent neural networks in speech recognition (which can be considered as the first deep neural networks). This funding resulted in extensive data collection so that we could build better speech recognition systems.

It was relatively straightforward to apply hidden Markov models to these large data sources (we just bought a lot more computers) but neural networks couldn’t be so easily scaled to more than one computer. As a result, all the good work in neural networks was put on hold until GPUs arrived when we could train everything on one computer again.  To some, such as Malvar, this was viewed as “The deep neural network guys come up and they invent the stuff. Then the speech guys come along and say, ‘what if I use that?’.” But in my opinion speech was the first big task for neural networks with image and text coming along later (Wikipedia’s view of history).

However you view history, the use of deep neural networks combined with the progression of computing power has drastically improved speech recognition technologies and is now easily consumable by the masses with global reach in a multitude of applications and use-cases.

Tony Robinson, Speechmatics

Unlocking the power of machine learning

Machine learning is the cornerstone of many technologies we use every day; without it we would be interacting with computers as we did 20 years ago – using them to compute things. With the advent of more powerful processors, we can harness this computing power and enable the machines to start to learn for themselves.

To be clear – they do still need input, they still follow patterns and still need to be programmed – they are not sentient machines. They are machines which can find more information that just 2+2=4. What machine learning is very useful for is extracting meaning from large datasets and spotting patterns. We use it here at Speechmatics to enable us to train our ASR to learn a different language on significantly less data than would have been possible even 15 years ago.

We are now in a world which is starting to find more and more uses for machine learning (eventually the machines will find uses for it themselves, the ‘singularity’, but we aren’t there yet!). Your shopping suggestions, banking security and tagging friends on Facebook are all initial uses for it, but the sky is literally the limit. There is no reason why eventually the Jetson’s flying cars wouldn’t be powered by machine learning algorithms, or why I Robot style cars couldn’t be controlled by a central hub. Machine learning could also be used to help out humans; to assist air traffic controllers by directing planes to a holding pattern, or help teachers to identify struggling pupils based on test results.

Machine learning coupled with neural networks (large networks of processors which begin to simulate a brain) can unlock even more power from machine learning. Whilst at Speechmatics we like to think we are changing the world – the reality is research into deep neural networks and machine learning are starting to unravel the way some of the most vicious illnesses operate. The mechanisms of HIV and Aids, as well as simulating flu transmission can both lead to a better understanding of how they operate.

As The Royal Society stated in a recent article on the possibilities of machine learning, they are calling for more research into machine learning ‘to ensure the UK make the most of opportunities’. The progress we have made so far is astounding and with an exciting prospect ahead, we at Speechmatics are continually innovating and researching artificial intelligence.

What is most exciting is to think how things could end up looking in the future. Today your mobile phone had more computing power than that which NASA used to put man on the moon. The phone which you use to combine candies into rows and crush them, has nearly 1 million times the processing power of the machine which landed on the moon. So just as it was hard for the scientists of the 60s to consider what we could do with more computing power (Angry Birds probably wasn’t in their thoughts), it is just as impossible for us to determine what we can do with machine learning backed up by current computing power.

So, welcome to the future. A future where computers no longer just compute. A future where processors think.

Luke Berry, Speechmatics

What does it take to be a non-techie in a deep technology company?

It all began at the Cambridge University Entrepreneurs (CUE) awards ceremony 3 years ago, where I won an award on the night with £1,000 prize money. And today I’m sitting in our new offices, surrounded by 32 full-time employees watching as our company continues to scale rapidly in 2017.

So how did it happen?

I first met Tony at a CUE networking event, my initial reaction was “100% no-way that I will get involved in a software company”. A few weeks later Tony posted into the CUE Facebook group and we got talking again. I was already involved in several other start-ups but after some consideration, I thought it might be worth a shot. Tony had developed the technology and frameworks but needed someone to commercialise the company.

I started working with Tony when there were only 5 employees at the company. Myself and my colleague spent our days cold-calling everyone and anyone we could to try and get a sale. It was a numbers game. Eventually someone was going to need and want us. And they did. By the end of 2014 we had doubled in size to 10 people – still working out of a shed.

We focused our efforts on our cloud-based system which we needed to ensure was commercially appealing. When Google made their Speech API available, we knew we had to beat them. So, we ran tests and found that we were 30-50% more accurate than Google.

So where are we now?

We have been generating strong revenues since we commercialised in 2014 and we just closed an investment round at the end of 2016. We have expanded our capabilities to provide our customers with technology that can be deployed on-premises and we recently launched our new real-time technology. The real-time technology can be used offline, on a device. We also developed an automated framework that gives us the ability to develop any language in the world in a matter of weeks. Overcoming the three fundamental challenges with current products on the market today.

Some advice?

  • Pick up the phone and speak to anyone
  • Be tentacular, speak to everyone you can in an organisation
  • Don’t be too smart, there is always someone smarter
  • Always manage expectations
  • Find your focus as a company
  • Challenge people

The CUE event gave me the platform I needed to meet the right contacts at the right time. Jules Robertson, CUE speakers representative says “we encourage entrepreneurs to develop their ideas and skills as future company leaders. With the multiple rounds of competitions, we reward £100, £1000 and £5000 to individuals or teams to help elevate them to produce a successful company. We hold the awards and networking events to help introduce our entrepreneurs to contacts and hear from industry experts”.

Who says you can’t lead a deep technology company as a non-techie?

Benedikt von Thuengen, Speechmatics

The language of Romance

Ceramic Bench Park Guell - Barcelona SpainWhen we hear Catalan nowadays we think of Barcelona, of Gaudí, of Miró and the strange sounding language with hints of Spanish that so many Barcelona dwellers speak – and this makes us question why we bothered learning Spanish at school.
However, the breadth and depth of Catalan stretches much further than that. It is spoken across Catalonia and in various variants in Valencia, the Balearics, parts of France, Sardinia and even appearing as the official language of Andorra.
It is a language that has had a potted history, declared socially unacceptable or illegal on numerous occasions throughout Spain’s past, yet it has continually rebounded even having its own Reneixença (renaissance) in the 19th century. Today, there are between 10 and 15 million speakers worldwide. And while it shares similarities to its cousins of the Romance languages evolved from Latin (predominantly Italian, French, Spanish, Portuguese, Romanian) – Catalan is very much its own language with a rich history of poetry, culture, art and important agency across industry, politics and commerce.
This became particularly notable last year when we released our Spanish automatic speech recognition system which generated significant interest across Spain, South America and beyond. However, we also quickly realised that, whether for call centres or media monitoring, education or subtitling – potential customers had a need for Catalan as well. Spanish was a good start but for true coverage across Spanish-speaking countries we simply had to have Catalan.
The Speechmatics languages team took it on as part of our traditional Christmas hackathon, and within 2 weeks we had a fully operating system that is available on our website and has already attracted significant interest with users across both our public cloud-based platform and private on-premises implementation.
Carme Calduch, a Catalan Lecteur of Cambridge and Queen Mary University of London tested our cloud-based Catalan transcription service. “Speechmatics has developed a fantastic tool, doing a manual transcription is a laborious task and using this service I was able to obtain a transcript in less than 3 minutes. It is completely automatic, and the result was impressive, because the quality of the transcription is very good. Without a doubt, I’d recommend using this service – it will be really useful to professionals across multiple industries.”
With the list of languages available rapidly growing on the Speechmatics speech recognition system – we welcome everyone to sign-up for a free trial of our public speech transcription service at www.speechmatics.com.

Ricardo Herreros-Symons, Speechmatics

Detecting the pattern of speech

By now it is pretty obvious that speech recognition is taking over the world, and so long as it doesn’t go all Hal 9000 on us, then the future looks very interactive. The promise of a world where light, TVs and coffee machines can be activated without touch has already been realised. But this is a (relatively) simple process of crosschecking what the device thinks it has heard and comparing it to a known list of commands (I have completely oversimplified this – the time taken to reach this technical epoch of voice control is some indication as to the complexity of the process).

We are now seeing more companies working on tech which can delve into the complexities of our voice. Banks are now using your voice as your password, emotion engines can detect your emotion when calling a call centre to give the operative a heads-up if things are about to go south and suggest actions they can take to resolve the issue.

The latest breakthrough comes from Canary Speech – a US start-up – which has developed a way of analysing phone conversations for several neurological diseases, ranging from Parkinsons to Dementia.

A pinch of reality may be required though. This is early stages for the start up. Don’t expect GPs to be replaced anytime soon by a microphone but there is no reason not to think that between machine learning and voice recognition we couldn’t start to see chatbots being used for front line GP care – culminating in the inevitable prescription of Ibuprofen, a staple of the British doctor. The tech that Canary are investigating is not yet fully mature, they will need to rely heavily on using large data sets over a sensible period of time to teach their machine how to effectively identify a problem.

How this technology will be rolled out is a big issue to consider. At the moment, most calls to call centres are recorded for monitoring and quality purposes – that’s monitoring of the call centre operatives, not the caller.  I’m not sure there are many people who would appreciate being told by Vodafone that they have identified them to have signs of dementia. That’s all yet to be ironed out as we get to grips with more of our data being analysed.

From Speechmatics’ point of view the more research that goes into using neural networks and machine learning the better. In-house we are getting better and better at finding more efficient ways of changing speech to text. We are able to do now on a phone – what five years ago – required banks of graphics processors. This has come about because the collective knowledge that computer science has developed in the last 5 years has advanced so much. Research breeds research. The more uses for speech recognition the more ways it can be streamlined.

Luke Berry, Speechmatics