← Back To All Posts
Date:
May 19, 2022

Cutting-Edge Uses for Artificial Intelligence and Machine Learning In Streaming

Trust is key in any relationship, even (or maybe especially) with artificial intelligence/machine learning. The premise is that with enough training and modeling, software will provide insights that would not have been apparent to mere mortals. All the solutions providers I spoke with for this article say they rely on machine learning to generate analysis and also count on additional contextual information to supplement the software, to make sure that the recommended decision has appropriate context.

The top use of AI/ML which 58% developers expected to use according to the 2020 Bitmovin Developer Report, was for creating recommendations. 50% of developers expected to use it for personalization. The previous year, their survey showed that more than 30% of developers were preparing to use AI/ML. But there are plenty of other uses for AI/ML, and those are what we'll focus on in this article.

Some companies spoke about AI/ML like it was a magic black box with their own proprietary special sauce. Without more details, I just don't have that much trust. What I do understand is algorithms are trained on data, and getting that data into a consistent format is a large, intensive exercise. The AI/ML is taking data, which is either unstructured in some way or inconsistent, and then generating a consistent set of categories or structured knowledge. "It's not the data per se, it's making sure that the AI itself is perceiving the data in the correct way," says Randa Minkarah, co-founder and chief operating officer, Resonance.AI.

"It's hard to say how long it will take [to do data normalization], but delivering smart algorithms that produce insights is a function of the size of data sets; the more subscribers and engagement actions you have, the faster you can perfect it," says Paul Pastor, chief business officer at Firstlight. "It's a volume and throughput game: A company with the scale of a Netflix can refine algorithms very quickly; for a service with a couple of hundred thousand subscribers, it's going to take longer."

Unstructured Data

According to a Gartner report in 2019, 80% of your data will be unstructured in the next 5 years. One of the biggest problems with all this unstructured data is identifying what's within the content. To gain insight into the unstructured data of video, there are small and large companies alike working on building machine learning models for tasks like sentiment analysis, named entity recognition, topic segmentation, text summarization, relationship extraction, mood, activity, color, locations, QoS, translation and sub-titling, contextual advertising placement and more.

"I think media companies are considering AI and ML. I don't think they've adopted it anywhere near as much as they can," says Minkarah. "I think that they're still wrangling their data to an extent and figuring out how do I deploy this (platform) and make it meaningful. I think that it's hard to deploy AI right now without having a consultant or a consulting group help you operationalize it."

A common complaint I hear is content metadata isn't standardized to the extent it needs to be. Which means when moving to a new OVP or CMS, there may be a lot of work involved in filling in the blanks. "We do use Gracenote and ROVI data to normalize, but even their data sets may not be complete," says Pastor. "Plus an OTT provider that wants to develop a competitive advantage might want to capture more data than is widely available."

Likely your existing vendors have something using AI/ML. The question is, what's worth trying out now?

To the Cloud and Beyond

About half of their customers are using AI/ML within their workflow, says Simon Eldridge, chief product officer, SDVI. "It's certainly more than we're doing it two years ago. It tends to not be the first thing that a customer does [it's more like the second focus]."  The first thing on many customers' roadmaps is actually migrating to the cloud, getting their content there, and moving their supply chains. "After they've done that initial work, then the AI tools are really an optimization cycle."

"The thing that I think has happened is there are many more smaller, single-purpose AI tools that do one thing like content deduplication, comparing the content, as people migrate to the cloud from multiple different sorts of storage," he says. "It's pretty common that you bought the same thing multiple times over because maybe you had it in your tape library and on spinning disc somewhere." I like this approach, because it tells me I haven't been hording copies of something in multiple locations and just like Marie Kondo suggests, clutter is bad.

Text Identification

Identification of text within content is another AI/ML focus. "For internationalization you don't want the [the wrong language in the] lower thirds," says Eldridge. Another SDVI tool scans for text and identifies the time code for the offending characters. While locating text seems pretty straightforward, a number of companies will identify wholesale transcripts of what's within the content. While we've also covered this before, it bears repeating that scanning hundreds of hours of content will take hundreds of hours without AI/ML.

For news and entertainment production Digital Nirvana offers "The ability to automatically generate a transcript of a feed automatically [using our recently introduced Metadata IQ SaaS product]," says CEO Hiren Hindocha. "If you're running a news organization and it is ingesting hundreds of feeds from different sources. Today the problem is that there aren't enough people to tell you what is in the feeds. There's just more data than can be humanly looked at."

They generate a transcript of a feed automatically to summarize what's in each feed. "We'll be able to allow the analyst or the editor to say, okay, this feed looks interesting because it has this and that. The data has been summarized by the machine."

Digital Nirvana's Trance generates transcripts automatically in multiple languages.

"Today what the system does is it generates transcripts of the audio feed that is coming in. The visual side of it is doing facial recognition of the content to be able to say that I'm seeing Rachel Maddow in this, in this video feed at this point in time, I'm seeing President Biden at this point in time." Also on the roadmap is the ability to summarize content and to identify the important concepts.

Contextual Advertising

Contextual advertising has become an important category because of the privacy restrictions which have happened in Europe and the US. For a while I've asked how people are doing this and till now I haven't had a satisfactory explanation.

"We've be working on associating the AI to recognize and categorize video based on IAB taxonomy," says Mika Rautiainen, CEO of Valossa Labs. That taxonomy contains lots of these advertising targeted topics for the entire piece of content, as well as specific parts of the content. "We have developed AI to be very detailed, so that it finds even sections within the video where this advertisement categories have been portrayed."

"It can recognize grocery shopping at the beginning of the video. Then there's cooking in the middle of the video and at the end that might be an alcoholic drink or beverage. These categories are easily digestible," he says. "We provide this IAB categorization that gives you advertisable categories for the entire video piece. For this example, since alcoholic beverages is one of the categories that is popping up and is considered to be a sensitive content category, any advertiser who wants to exclude this they can decide if the content is brand safe." Creating a list of timecodes exactly where these topics happen make inserting a contextual advertisement very accurate.

Flagging Content

Content identification can be used to automatically identify specific actions. This might include sports activity like goals, homeruns, baskets, etc. or for controversial events like identifying violence, nudity, sexual behaviour, smoking, drinking and drug use. There are two ways to accomplish this.

"If your job is content internationalization, for example, previously, you would have watched a show and it would have taken two hours or whatever to go through and find the bad bits and cut them out and make a new version," says Eldridge. "Whereas now they can do it in 10 minutes per show because they're not watching the whole thing in slow motion and looking for things, they're using markers on the timeline to say, show me the next event."

Automated Trailer Creation

Several years ago there was an effort to build a short film using ML/AI called Sunspring, which was very funny, but not very sensical. Even today, I've been assured I wouldn't be able to tell if something was written by a piece of software or a person. While I do think I can tell, there are uses which don't require very complicated applications. Creation of trailers and highlight clips are both good uses for AI/ML.

"[Commercial] Broadcaster MTV 3 in Finland is using our AI to create promotional highlights of their TV series and movie trailers for their video on demand platforms," says Rautiainen. This broadcaster doesn't have the resources to create trailers for all of their content, but Valossa does it for them. "They are really creating more dynamic and engaging consumer experiences within those services and helping customers to get into the content faster to assess and understand whether the content is appropriate."  "AI is actually generating multiple outputs, it looks into the content once and then produce lots of different outputs. One might be action oriented, another one is more emotional," he says. The trailers are much better than Sunspring.

Content Matching

Matching video content to webpages has been something JW Player has been doing for a few years now. "We do have a lot of metadata that is associated with video, like descriptions and sometimes captions and transcripts," says chief technical officer Dave LaPalomento. "So we're able to pull out the thrust of what the video was about matching those two things together, which is something search engines have been doing for a long time."

The primary thing you need to do is put a little bit of code into the pages to which you want article matching applied. The code will be replaced with selected video match for an that article. You could also apply business rules to it if you're so inclined. Is anyone still doing this by hand, or has content matching via AI/ML replaced the human editor? "It's got a decent base now, but we still see a lot of room to grow across our customers."

Quality of Service

There are a number of different tools in this bucket. "Our mode of measurement is by inserting an SDK directly into the streaming publishers apps to measure full census, real time continuously," says Aditya Ganjam, chief product officer at Conviva. "We measure the viewer's experience—their engagement, behaviors, what they're watching, what device they're on, where they are."

Any given time there may be in 20 or 30 million simultaneous streams playing and they provide actionable insights to dev ops, engineering, product, marketing and ad monetization teams using AI and ML to determine a root cause to what people are reacting to. "We're more focused on finding insights from data that either automatically or human can take action on," Ganjam says.

It used to be all about the aggregate information based on traffic patterns. Now they are seeing more attention to two metrics used for customer acquisition: One is how fast you can acquire customers. Seconds is how long can they be retained for? "We noticed that our customer are actually zooming in a lot more," says Hui Zhang, founder and chief scientist, Conviva.

Cultural Categorization

This next company uses AI and ML to categorise content for appropriate ratings and international distribution required as many more media companies expand further into world-wide streaming distribution. "We've been able to acquire this deep set of knowledge, create our own methodologies, systematize and build technology that helps us identify and classify cultural events," says Teresa Phillips, CEO, Spherex. They use computer vision to look at what's on the video, natural language processing (NLP) to look at the subtitles and a synthesis of independent signals to evaluate content meaning.

The company has classified ratings for content for 200 countries and 246 territories. "The machines take up all the edge cases that we as humans can't possibly sit around and think about and turn into rules. There's a lot of rules that we have generated, but a lot of countries either are inconsistent and how they apply their own rules or their rules are unpublished or unspoken."

Spherex's Greenlight identifies multiple categories of sensitive content, based on classifications for 200 countries and 246 territories.

A recent study in New Zealand showed if there is a school setting where there are incidents of self-harm, people are getting bullied, excessive body art, or if there's eating disorders exhibited on screen, these are triggers for teens.

"[Using AI and ML we've identified] what's suitable for children for nine years old, 12 years old, 15 years old, 18 years old," says Phillips. This is based on what they call classifiable, cultural elements that are found in film and television that cultures care about, like violence, sexuality, nudity, blasphemy, suicide drugs. "Our systems have a set of very complicated rules that when we extract a cultural event, let's say that it's drugs, for instance we have to identify a lot of context around that," says Phillips. "What's going on? Is it just presented on this screen? Is it being used? Is it in a party setting? Is it being sold? Is it glamorized or encouraged? Are there consequences to doing illegal drugs, and then there's kind of a lesson there?"

Honorable Mentions

Now, as opposed to the last time we did this article, there are a lot more active uses of AI/ML. There's also all sorts of things which we didn't include simply because of space limitations. There's compliance logging, including automatic translation and caption generation also from Digital Nirvana. There's full-on metadata normalization from Firstlight media when clients bring new content libraries into their CMS. There's advertising placement optimization from JW Player, helping customers get better prices knowing the ads will be seen and engaged with. There's the whole side of content recommendation and personalization. There's automated editing. CDN traffic management, content engagement evaluation and the list goes on.

The categories here seemed like they are all solving very imminent, clear problems. There are a lot of products on the market promising AI/ML application for advertising for example, where I have a hard time writing about them, simply because their benefits are too murky.

However, this article was about existing use cases. Our article on FAST includes interviews about future use cases using AI/ML which seem very likely, but not yet in production.

There's something to be said for the old adage, "trust but verify." Likely there are areas where AI/ML can be allowed its own, but the thing is models need to be trained, and once trained they continually need care. Do I trust a machine to self-police itself? Nope. While all of these technologies are amazing I still don't trust them fully. Yet.

Source: Streaming Media

Related Insights

YouTube Thumbnails Can Get You in Trouble

Here’s Why Creators Should Pay Attention

When we talk about content compliance on YouTube, most people think of the video content itself — what’s said, what’s shown, and how it’s edited. But there’s another part of the video that carries serious consequences if it violates YouTube policy: the thumbnail.

Thumbnails aren’t just visual hooks — they’re promos and they’re subject to the same content policies as videos. According to YouTube’s official guidelines, thumbnails that contain nudity, sexual content, violent imagery, misleading visuals, or vulgar language can be removed, age-restricted, or lead to a strike on your channel. Repeat offenses can even result in demonetization or channel termination. That’s a steep price to pay for what some may think of as a simple promotional image.

The Hidden Risk in a Single Frame

The challenge? The thumbnail is often selected from the video itself — either manually or auto-generated from a frame. Creators under tight deadlines or managing high-volume channels may not take the time to double-check every frame. They may let the platform choose it automatically. This is where things get risky.

A few seconds of unblurred nudity, a fleeting violent scene, or a misleading expression of shock might seem harmless in motion. But when captured as a still image, those same moments can trigger YouTube’s moderation systems — or worse, violate the platform’s Community Guidelines.

Let’s say your video includes a horror scene with simulated gore. It might pass YouTube’s rules with an age restriction. But if the thumbnail zooms in on a blood-splattered face, that thumbnail could be removed, and your channel could be penalized. Even thumbnails that are simply “too suggestive” or “misleading” can get flagged.

Misleading Thumbnails: Not Just Clickbait — a Violation

Another common mistake is using a thumbnail that implies something the video doesn’t deliver — for example, suggesting nudity, shocking violence, or sexually explicit content that never appears in the video. These aren’t just bad for audience trust; they’re a clear violation of YouTube’s thumbnail policy.

Even if your content is compliant, the wrong thumbnail can cause very real problems.

The Reality for Content Creators

It’s essential to recognize that YouTube’s thumbnail policy doesn’t exist in isolation. It intersects with other rules around child safety, nudity, vulgar language, violence, and more. A thumbnail with vulgar text, even if the video is educational or satirical, may still result in age restrictions or removal. A still frame with a suggestive pose, even if brief and unintended in the video itself, can be enough to get flagged.

And for creators monetizing their work, especially across multiple markets, the risk goes beyond visibility. A flagged thumbnail can reduce ad eligibility, limit reach, or cut off monetization entirely. Worse, a pattern of violations can threaten a channel’s long-term viability.

What’s a Creator to Do?

First, you need to know how to spot the problem and then know what to do about it. Second, you need to know if the changes you make might affect its acceptance in other markets or countries. Only then can you manually scrub through your video looking for risky frames. You can review policies and try to stay up to date on the nuances of what YouTube considers “gratifying” versus “educational” or “documentary.” But doing this at scale — especially for a growing content library — is overwhelming.  

That’s where a tool like SpherexAI can help.

A Smarter Way to Stay Compliant

SpherexAI uses frame-level and scene-level analysis to flag potential compliance issues — not just in your video, but in any frame that could be selected as a thumbnail. Using its patented knowledge graph, which includes every published regulatory and platform rule, it will prepare detailed and accurate edit decision lists that tell you not only what the problem is, but also for each of your target audiences. Whether you're publishing to a single audience or distributing globally, SpherexAI checks your content against YouTube’s policies and localized cultural standards.

For creators trying to grow their brand, monetize their work, and stay in good standing with platforms, that kind of precision can mean the difference between success and a takedown notice.

Want to know if your content is at risk? Learn how SpherexAI can help you protect your channel and optimize every frame — including the thumbnail. Contact us to learn more.

Read Now

Automating Peace of Mind: Navigating YouTube's Global Guidelines with SpherexAI

For media companies distributing content across YouTube, compliance is no longer just a legal requirement—it’s a prerequisite for discoverability, monetization, and channel survival. YouTube enforces strict policies governing child safety, vulgarity, graphic content, and cultural sensitivity. For content owners, ensuring compliance across multiple categories and geographies is a complex and labor-intensive process. To address this issue, SpherexAI provides a scalable solution tailored for any content creator or owner.

YouTube’s Expanding Compliance Landscape

YouTube’s Community Guidelines cover a wide array of regulated categories. Content can be removed or age-restricted—and creators may face penalties—if videos violate policies on:

  • Nudity and sexual content: Content that includes sexually gratifying imagery or non-consensual sexualization is prohibited.
  • Violence and graphic imagery: Footage showing serious injury, bodily fluids, or torture intended to shock viewers can be flagged or removed.
  • Child safety: Content that exploits minors, includes inappropriate family content, or features children in dangerous stunts is not allowed.
  • Illegal or regulated goods: YouTube restricts promotion of firearms, narcotics, and gambling services, among others.

Managing compliance with each of these categories—especially when content is global and multilingual—is a logistical challenge for distributors.

Enter SpherexAI: Precision Compliance Automation at Scale

SpherexAI applies multimodal AI to analyze video content across dialogue, visuals, audio, and metadata. It detects compliance issues not only by scanning for policy violations but also by identifying subtle cultural or regional sensitivities that could result in content removal or limited distribution.

For example, the platform flags:

  • Dialogue with excessive profanity or sexual references, aligned with YouTube’s vulgar language policy.
  • Visuals showing partial nudity, firearm use, or dangerous stunts, which may trigger strikes or age restrictions.
  • Culturally sensitive depictions—such as religious imagery or portrayals of death—that may violate local norms and platform rules.

SpherexAI outputs include timestamped alerts and severity levels, allowing content owners to make targeted edits rather than performing full manual reviews.

Equal Rules for All Creators

Whether you’re a major studio releasing film clips or a digital-first creator uploading your first series, YouTube holds all content publishers to the same standards. Community Guidelines are enforced platform-wide, regardless of a channel’s size, history, or market familiarity.

This presents a significant challenge for new entrants. Many first-time creators or distributors may be unaware that a thumbnail featuring misleading imagery, a prank involving minors, or a scene with unedited drug references can lead to demonetization or a channel strike. But YouTube’s enforcement is uniform: content that violates policy is subject to the same sanctions across the board.

SpherexAI helps level the playing field by equipping every content team—regardless of experience—with access to the same tools used by top studios. Its patented knowledge graph, built on over a decade of regulatory insight and expert human annotation, powers its AI models with unmatched precision. The result: faster reviews, greater accuracy, and fewer costly mistakes.

Cross-Platform, Region-Aware, and Regulation-Ready

Unlike tools focused on metadata or age ratings alone, SpherexAI delivers:

  • Granular analysis: Scene-by-scene breakdowns for violence, vulgarity, sexual content, and self-harm risks.
  • Cultural intelligence: Predictive models assess content suitability across 240+ territories using Spherex’s proprietary “cultural distance” framework.
  • Workflow integration: The platform’s API allows integration into existing supply chains and CMS platforms for automated review at scale.

Reducing Risk, Unlocking Revenue

YouTube’s monetization eligibility hinges on content safety. Channels can be demonetized or de-prioritized in search and recommendation if flagged for repeated violations. Well-known creators Logan Paul, ScreenCulture, and LH Studios have all been sanctioned for violations. By proactively identifying and resolving compliance issues before publishing, SpherexAI empowers content owners to:

  • Avoid strikes or takedowns
  • Retain monetization rights
  • Accelerate time-to-market
  • Protect brand reputation

Conclusion

YouTube is a dynamic platform for global content distribution that requires rigorous adherence to evolving content standards. For studios, broadcasters, and new creators alike, SpherexAI offers an AI-powered safety net automating policy compliance while preserving creative integrity. When SpherexAI is integrated into your production workflow, you can publish confidently at scale, with full compliance, and with no brand risk.

Ready to streamline compliance and expand your YouTube strategy globally?

Book a demo or visit spherex.com to learn how SpherexAI can support your team.

Read Now

Spherex CEO Teresa Phillips Talks Practical AI for Global Content Localization at EnTech Fest

At this year’s DEG EnTech Fest, Spherex CEO and Co-Founder Teresa Phillips joined a panel to explore one of the most practical and impactful uses of AI in entertainment today: localization.

During the session titled “Practical AI For Speed and Savings in Localization,” Phillips shared how Spherex is leveraging AI to deliver “deep video understanding” that accelerates compliance and rating decisions in over 200 markets. As she explained, understanding the context—cultural, visual, and narrative—is crucial in determining whether a piece of content is suitable for audiences worldwide.

“AI can now detect not just what happens in a scene, but how it might be interpreted in different cultural and regulatory environments,” said Phillips. For example, in Scandinavian countries, if a trusted figure, such as a clergy member, commits an unethical act onscreen, it can dramatically impact a film’s age rating. SpherexAI is trained to identify these nuanced moments, flagging them for human review when needed.

Phillips also highlighted the role of AI in augmenting human decision-making, noting that “AI agents can be trained to ask humans the right questions—like whether the drinking in a scene is casual or excessive—ensuring more consistent, scalable evaluations.”

The conversation also acknowledged the broader industry shift that AI is bringing to localization workflows—from quality control (QC) to artwork generation, compliance, and project management. With automation poised to displace some entry-level roles, Phillips raised a key question for the future: “If junior roles are the first to be automated, how do we bring new talent into the industry? We have a responsibility in our organizations to create opportunities for the next generation.”

Joining Phillips on the panel were Silviu Epure (Blu Digital Group), Chris Carey (Iyuno), Kelly Summers (The Sherlock Company), and Duncan Wain (Zoo Digital), offering a 360° view on how AI is transforming the way stories cross borders.

Read Now