Making AI work in the real world: Latest web training series focuses on behavior and model control
AI models don’t just work or fail. They learn, adapt and sometimes stall or go off track. The challenge is understanding why they behave the way they do and how to guide them back on course.
Season two of Rakuten Cloud president Partha Seetala 's AI training series, A Comprehensive and Intuitive Introduction to Deep Learning (CIDL), dives into that challenge. It focuses on how neural networks actually learn and what we can do to influence and improve that learning.
Last summer, we shared top takeaways from season one, which introduced how neural networks are structured and how data flows through them. Season two picks up from there, unpacking the mechanisms behind model learning, going from processing and understanding patterns in pure discrete numeric data to sequence data, like text and timeseries.
“Too often, AI looks like magic,” says Partha. “But it’s not magic, it’s just a clever method. Engineers especially need to see behind the curtain. Once we understand why and how the magic works, we're no longer just admiring the illusion, we're learning the craft, ready to design and perform our own.”
Season two, now available on YouTube, is about making that method intuitive. It gives engineers, product teams and decision-makers the insight to shape, debug and tune AI models in real-world applications, especially as AI becomes foundational to network operations, infrastructure management and enterprise software.
Here, we highlight five key takeaways from Partha’s training to offer a glimpse of what you will learn about how to better train and control AI models to learn and perform.
#1 Word embeddings turn words into meaningful numbers
Neural networks can't process raw text—they need to be fed numbers. But simple numeric IDs or one-hot encodings don’t work given sparsity, large volume size and treatment of every word as unrelated to others.
Word embeddings solve this by converting words into dense, low-dimensional vectors where semantic relationships are captured in geometry. Words like “apple” and “orange” end up closer in vector space than “apple” and “firewall”.
Techniques like Word2Vec (CBOW & Skip-Gram) predict context or target words to learn embeddings from usage patterns, while GloVe uses global co-occurrence statistics to achieve similar results. The result? Models gain a form of “understanding” about word meaning, making downstream Natural Language Processing (NLP) tasks like classification, translation and search far more accurate and context-aware.
AI in action: Telecom operators receive thousands of customer messages daily via chats or emails. Word embeddings can capture the semantic meaning of telecom-specific terms (e.g., "latency," "outage," "billing issue") and general sentiment indicators ("frustrated," "slow," "down") to help automated support systems accurately classify customer inquiries, prioritize network issues or route customer tickets more effectively.
#2 RNNs give neural networks a memory
Feed-forward networks, which Partha presented last summer in Season 1, treat every word in a sentence independently which is great for numeric data but bad for language. That’s because language is sequential. Example: “The cat sat” is very different from “sat the cat.”
Recurrent Neural Networks (RNNs) introduce a hidden state that flows through time, carrying context from one word to the next in a sentence or paragraph. This hidden state is like “memory,” allowing RNNs to remember relationships between words and data points as it processes sequences like sentences, speech, or even numeric time-series data. Each new input isn’t seen in isolation but as part of an evolving story. For instance, in the sentence “I locked the door and left the ___,” an RNN can use prior context to guess “house” instead of random guesses like “book” or “cat.”
AI in action: Enterprise security systems continuously generate logs that are highly sequential (e.g., login attempts, file access events). RNNs maintain context from these events over time, recognizing unusual sequences (e.g., repeated failed logins followed by privileged access attempts) to detect advanced persistent threats or insider threats.
#3 LSTMs solve the forgetfulness of RNNs
Vanilla RNNs are powerful but they tend to be forgetful. When dealing with long sequences, gradients vanish during backpropagation with earlier information fading away during learning. Long Short-Term Memory (LSTM) networks address this with a dual-memory mechanism comprising the cell state (i.e., long-term memory) and the hidden state (i.e., short-term memory).
Three gates (forget, input and output) decide what to remember, what to forget and what to expose. This architecture allows LSTMs to retain critical information across longer spans, like remembering the subject of a paragraph even after 30 words. That’s why they revolutionized tasks like speech recognition, document modeling and language generation before transformers took over.
AI in action: In the case of telecom operators, which monitor network KPIs (e.g., latency, packet loss, throughput) continuously over days, weeks or months, LSTMs retain crucial context over extended periods, identifying subtle performance degradation trends or seasonal patterns. This helps predictive maintenance proactively address potential issues.
#4 Seq2Seq models turn sequences into other sequences
Many NLP tasks involve converting one sequence into another. Think translating languages, summarizing articles or converting natural language to structured queries.
Sequence-to-Sequence (Seq2Seq) models tackle this with a two-part setup: 1) An Encoder RNN/LSTM compresses the input sequence into a context vector; and 2) A Decoder RNN/LSTM expands that context vector into an output sequence.
The context vector provides a snapshot of the entire input sequence, but instead of learning everything from scratch, teacher forcing feeds the decoder the correct previous word. This speeds up convergence and helps models learn complex mappings between input and output sequences, despite the rigid bottleneck of the fixed-size context vector.
AI in action: For telecom networks, which generate complex logs or alarms, Seq2Seq models can translate dense, technical logs (input sequence) into simplified, human-readable summaries (output sequence) to speed up troubleshooting and help network operation centers (NOCs) diagnose problems more quickly.
#5 Attention mechanism creates dynamic focus for each word
Seq2Seq bottlenecks can be broken down for long or complex inputs. In these cases, attention presents a game-changing upgrade that lets the decoder dynamically look back at each encoder hidden state instead of compressing everything into a single context vector. For every word it generates, attention asks: “Which input words are most relevant right now?”
This selective focus enables the model to align outputs with the right parts of the input, especially useful for when word order differs (e.g., in translation) or when a single output word relates to multiple input tokens. Attention dramatically improves accuracy and interpretability while laying the groundwork for the transformer architecture that powers modern LLMs like BERT and GPT.
AI in action: Enterprises generate extensive security logs from multiple sources, such as firewalls, IDS/IPS, and endpoint monitoring. Attention-based models help in doing accurate translation of English log messages into machine understandable API calls to ensure correct action is taken for remediation and issue resolution.
Watch season two now
Season two of A Comprehensive and Intuitive Introduction to Deep Learning offers engineers and business leaders clarity and constructive skills to engage more confidently with AI.
Whether you’re applying AI to optimize telecom infrastructure, automate enterprise workflows or build more responsive software systems, Partha’s approach breaks through complexity to deepen real understanding.
The full season is available now on YouTube, and Partha has confirmed in a recent LinkedIn post season three is on the way.
Check it out now and mention Partha Seetala in the comments to ask a question or start a conversation.
Head of Sales, Service Providers, Canada @ HPE Networking | Driving Telecom Strategies
1wExcellent content, Partha Seetala I watched it with genuine interest.
Aspiring Data Scientist | Machine Learning, Deep Learning, Computer Vision and NLP | Python, SQL, Tableau , Excel | Transforming Data into Insights
3w🚀 Amazing insights in Season 2, Partha! I recently worked on NLP and time-series-based models (including emotion detection using LSTM and sequence classification), and your breakdown of attention and Seq2Seq really connected with the challenges we face in real-world ML deployments. 💭 The line “AI isn’t magic—it’s method” truly stands out. I’m curious: in production-level telecom AI systems, how do you manage the trade-off between model performance and explainability, especially when attention-based models are involved? Appreciate the clarity and practical focus of this series — already started watching it on YouTube. Looking forward to Season 3! @Partha Seetala
Founder, AI Technology and Business Consulting Company, Living Tech - Currently hiring in Online and Social Tech Sales, Always hiring in AI Tech Dev
3wSuper. With Vivek Murthy at the helm, of course you expect a wonderful roadmap. And we are seeing that. Shishir www.AI-forBussiness.com
Subscribe to the Zero-Touch newsletter https://www.linkedin.com/newsletters/zero-touch-7141109105443663872/