Josh Studl

Josh Studl

Pittsburgh, Pennsylvania, United States
2K followers 500+ connections

About

I'm an entrepreneurial, creative AI solutions engineer with specialization in engineering…

Activity

Join now to see all activity

Experience

  • Turbine Workforce Graphic

    Turbine Workforce

    Pittsburgh, Pennsylvania, United States

  • -

    Washington, District of Columbia, United States

  • -

    Pittsburgh, Pennsylvania, United States

  • -

    Greater Pittsburgh Area

  • -

    Washington, DC

  • -

    Washington D.C. Metro Area

  • -

    Washington D.C. Metro Area

  • -

    Washington, DC

  • -

  • -

    Washington, District of Columbia, United States

  • -

  • -

  • -

  • -

Projects

  • AI-Powered Content Generation Pipeline

    -

    Overview: Built an automated system producing 15+ professional web pages with AI-generated images for Turbine Workforce & Apprentage.
    • Automated Content Creation: Generated MDX/TSX pages, ASCII diagrams, and images via GPT-4 + Stability AI SD3
    • Visual System: Created 20+ specialized presets for dashboards, onboarding, compliance tools, OJT
    • Brand-Aware: Maintained style across Turbine, Apprentage, VELA, AITaaS
    • Sitemap-Driven: Bulk processed JSON sitemaps with structured…

    Overview: Built an automated system producing 15+ professional web pages with AI-generated images for Turbine Workforce & Apprentage.
    • Automated Content Creation: Generated MDX/TSX pages, ASCII diagrams, and images via GPT-4 + Stability AI SD3
    • Visual System: Created 20+ specialized presets for dashboards, onboarding, compliance tools, OJT
    • Brand-Aware: Maintained style across Turbine, Apprentage, VELA, AITaaS
    • Sitemap-Driven: Bulk processed JSON sitemaps with structured prompt architecture

    Technical Highlights:
    • Python async pipeline for concurrent LLM/image generation
    • ChromaDB for RAG-enhanced content
    • MLflow tracking, timestamped output directories
    • Consistent Industry 4.0 visual style

    Impact:
    • Cut content creation time from days to minutes per page
    • Scaled to 100+ pages with consistent quality
    • Enterprise-ready for CI/CD integration

    Stack: Python, OpenAI GPT-5 api, Stability AI SD3, ChromaDB, MLflow, Next.js, MDX, TypeScript, RAG pipelines

  • Model Context Protocol - Design & Implementation

    -

    Designed a universal schema for interoperability, auditability, and assistant-native operations.
    This work produced clear and positive product and budget impacts, as well as an extreme platform value add.
    Impacts:
    • 5x faster onboarding
    • Up to $400K annual operational savings
    • ≤0.5% error rates for compliance exports
    • 90% assistant adoption with >92% CSAT

    Assets:
    1. Universal Schema – MCP as the backbone contract for all modules
    2. Governed…

    Designed a universal schema for interoperability, auditability, and assistant-native operations.
    This work produced clear and positive product and budget impacts, as well as an extreme platform value add.
    Impacts:
    • 5x faster onboarding
    • Up to $400K annual operational savings
    • ≤0.5% error rates for compliance exports
    • 90% assistant adoption with >92% CSAT

    Assets:
    1. Universal Schema – MCP as the backbone contract for all modules
    2. Governed Interoperability – seamless flow across Ops and integrations
    3. Audit-Ready Architecture – versioned, secure, queryable data
    4. Assistant-Native Operations – explainable, compliant AI in every workflow

    Deliverables:
    • Base Signals Layer (Entity, Semantic, Knowledge, Process metadata)
    • Ops Modules (KnowledgeOps, LearningOps, OJTOps, ComplianceOps, ReportingOps)
    • MCFT Pipeline (Ingest → Enrich → Embed)
    • Immutable Compliance Ledger
    • Enterprise Integrations (SCORM/xAPI, RAPIDS, CWDS)

    Other creators
  • Video RAG App

    -

    Built from scratch a Retrieval-Augmented Generation (RAG) pipeline for video content, enabling structured metadata extraction, multimodal analysis, and downstream knowledge graph enrichment. Designed for compliance, training, and instructional video datasets.
    • Automated Frame Extraction: download Vimeo videos, extract frames (every 3–5 seconds), and validate images before processing.
    • Multimodal Metadata Generation: Integrated GPT-5 and DeepSeek multimodal models to parse frames into…

    Built from scratch a Retrieval-Augmented Generation (RAG) pipeline for video content, enabling structured metadata extraction, multimodal analysis, and downstream knowledge graph enrichment. Designed for compliance, training, and instructional video datasets.
    • Automated Frame Extraction: download Vimeo videos, extract frames (every 3–5 seconds), and validate images before processing.
    • Multimodal Metadata Generation: Integrated GPT-5 and DeepSeek multimodal models to parse frames into scene descriptions, workflows, narration, and knowledge graph triplets.
    • RAG Contextualization: Enriched video interpretation by retrieving relevant context from Qdrant (OpenAI & BERT embeddings), grounding outputs in compliance and training data.
    • Schema-First Outputs: Standardized results into the Turbine Scene Output Schema with fields like title, intent, actors, outcomes, triplets, and linked_documents.
    • Resilient Pipeline Design: Implemented error handling, safe JSON parsing, timestamped output directories, and SQLite persistence for structured metadata.

    Technical Highlights:
    • Python-based pipeline with async processing for concurrent frame encoding and GPT calls
    • FFmpeg + OpenCV for robust video frame extraction and base64 encoding
    • Qdrant vector DB with dual embeddings (OpenAI + BERT) for hybrid retrieval
    • DeepSeek & GPT-5 for multimodal LLM calls (image + text → structured JSON)
    • SQLite + JSON/Markdown output for portability and downstream fine-tuning

    Business Impact:
    • Transformed raw video into structured, searchable knowledge assets
    • Enabled compliance workflows, training analytics, and conversational UX assistants powered by video metadata
    • Reduced manual video annotation effort from hours to minutes per video
    • Established a scalable foundation for multimodal RAG systems across workforce and apprenticeship use cases

    Technologies Used:
    Python, FFmpeg, OpenCV, Chromadb, OpenAI GPT-5, SQLite, async/await, vector embeddings, JSON schema validation

  • Mulit-scenario Fine-tuned LLM from custom knowledge graph

    -

    This grueling solo project yielded a domain-specific, multi-task competent, fine-tuned LLM from custom knowledge graph. The source knowledge comprised 250+ software UI screen recordings on Zoom, and dozens of Zoom transcripts of customer sales demos. Extracted from transcripts were accurate and ranked metadata on customer problem, action, and solution. From there a vector database was created for further enrichment. A graph db eventually was constructed to ensure, best as possible, that the…

    This grueling solo project yielded a domain-specific, multi-task competent, fine-tuned LLM from custom knowledge graph. The source knowledge comprised 250+ software UI screen recordings on Zoom, and dozens of Zoom transcripts of customer sales demos. Extracted from transcripts were accurate and ranked metadata on customer problem, action, and solution. From there a vector database was created for further enrichment. A graph db eventually was constructed to ensure, best as possible, that the training data contains context and nuance for user roles and scenarios -- in specificity and extendibility.
    - Every answer can be traced and justified.
    - Training loss fell from ~2.0 to <0.3 in the first 100 steps.
    - Accuracy: Surged from 70% to >96%—holding steady for every scenario.
    - Validation data affirms no overfitting, no instability, no hallucinations.

  • dual embedding strategy: SciBERT, 3-small, and Qdrant

    -

    I developed a powerful pipeline to generate a training course on a very technical topic of Photonics. The approach centered on parsing and embedding technical papers and academic research. The embedding models used were SciBERT uncased, text-embeddings-3-small. Qdrant provided the similarity search and storage. This system maps PDFs, lecture notes, and research papers to a competency model for Photonics that I also created. The result is a scalable framework for workforce development and…

    I developed a powerful pipeline to generate a training course on a very technical topic of Photonics. The approach centered on parsing and embedding technical papers and academic research. The embedding models used were SciBERT uncased, text-embeddings-3-small. Qdrant provided the similarity search and storage. This system maps PDFs, lecture notes, and research papers to a competency model for Photonics that I also created. The result is a scalable framework for workforce development and modular training.
    1. **Content Processing Pipeline**:
      - Built an end-to-end solution using **PyMuPDF** and **SciBERT** for extracting, cleaning, and processing text from PDFs. The system chunked content for embedding and further analysis, using regex for cleaning.

    2. **Pre-Embedded Competency Model**:
      - Embedded a **Silicon Photonics competency model** and domain-specific vocabulary for fast contextual comparisons, reducing computational overhead.

    3. **Contextual Competency Mapping**:
      - Used **cosine similarity search** to map content to competencies based on meaning rather than keywords.
    4. **Qdrant for Vector Storage**:
      - Integrated **Qdrant** to store embeddings and enable scalable, similarity-driven searches for competency-aligned content.

    5. **GPT Integration**:
      - Utilized **OpenAI's GPT** to generate metadata like summaries and learning outcomes from PDF content, enhancing training materials.
    6. **Handling Diagrams and Text**:
      - Developed a method to treat diagrams separately from text for improved embedding accuracy and content understanding.

    7. **Embedding Strategy**:
      - Applied **SciBERT** for scientific embeddings and **OpenAI** embeddings for smaller vectors, facilitating competency mapping.
    8. **Parallel Processing**:
      - Leveraged **ThreadPoolExecutor** to process multiple PDFs simultaneously, improving efficiency.

  • Transforming Vocational Safety Training with AI: RAG and Concatenation

    -

    Using Retrieval-Augmented Generation (RAG) and response concatenation, I developed a system to generate tailored safety profiles for each piece of equipment, enhancing training with speed and precision.
    This script combines OpenAI’s GPT-4 model and Qdrant’s vector search to create in-depth safety profiles unique to each tool. Here’s what sets this approach apart:
    1. RAG-Powered, Equipment-Specific Iterations: Each equipment profile is crafted through a RAG process, retrieving OSHA…

    Using Retrieval-Augmented Generation (RAG) and response concatenation, I developed a system to generate tailored safety profiles for each piece of equipment, enhancing training with speed and precision.
    This script combines OpenAI’s GPT-4 model and Qdrant’s vector search to create in-depth safety profiles unique to each tool. Here’s what sets this approach apart:
    1. RAG-Powered, Equipment-Specific Iterations: Each equipment profile is crafted through a RAG process, retrieving OSHA standards, safety practices, and relevant hazards from Qdrant based on specific prompts, ensuring tailored and thorough content.
    2. Context Nodes and Payloads for Rich Insight: Each response iteration includes essential data on equipment setup, behaviors, and hazards, building a nuanced understanding crucial for observers in high-risk environments.
    3. Concatenation for Impactful Guidance: After generating responses for each prompt (e.g., operation details, hazard scenarios), the responses are concatenated into a unified “Safety Training Profile” for each tool. OpenAI then refines this final output, ensuring clarity and completeness.
    The Final Output Each equipment profile offers comprehensive observer guidance, including:
    • Operation Details: Clear descriptions of equipment behavior in action.
    • Risk and Hazard Profiles: Detailed information on both direct and indirect risks.
    • Observation Practices: Guidance on safe positioning, attire, and precautions.
    • Emergency Procedures: Step-by-step protocols for responding to mishaps.
    Why This Matters? Enhanced Training: RAG and concatenation deliver equipment-specific, context-aware guidance, elevating vocational training standards. Real-World Relevance: The content reflects actual scenarios, preparing observers for real-world equipment interactions. Efficiency and Scalability: New equipment profiles can be quickly created as tools are added, providing a scalable training solution.

  • 99 nuanced Dall-e-3 images in 11 minutes

    -

    This project was about crafting super cool visuals to accompany technical training content related to photonics. The objective was to design compelling cover images for training modules that conveyed a high-tech, future-focused aesthetic without including distracting elements, at scale. The winning process required a blend of structured data handling and creative prompt engineering to achieve acceptable results.
    The initial approach was straightforward: send topic titles and some extracted…

    This project was about crafting super cool visuals to accompany technical training content related to photonics. The objective was to design compelling cover images for training modules that conveyed a high-tech, future-focused aesthetic without including distracting elements, at scale. The winning process required a blend of structured data handling and creative prompt engineering to achieve acceptable results.
    The initial approach was straightforward: send topic titles and some extracted metadata to the DALL-E-3 images.generate endpoint to get tailored images. However, the real complexity lay in getting the model to
    respect the prompt.
    A key insight was how negative instructions, exclusion prompts — “don’t include text, humans, heavy machinery or pollution” — don't work with Dall-e-3.
    Data engineering was straight forward. This iterate thru JSON approach was essential to efficiency and ensuring consistent look, but nuanced differences.
    Ultimately, this project showcased how a few functions and structured data can produce very cool visuals that effectively convey complex technical content.

  • Facebook Fatwa

    -

    ConStrat used its proprietary software to collect and analyze approximately 40,000 social media entries in Arabic and English between January 1 and June 30, 2011 to support the Foundation for Defense of Democracies in the first-ever study of what radical Saudi Wahhabists are preaching to their followers about the United States and non-Muslims on social media sites.

    Titled “Facebook Fatwa,” authored by FDD vice president for research Jonathan Schanzer and FDD research associate Steve…

    ConStrat used its proprietary software to collect and analyze approximately 40,000 social media entries in Arabic and English between January 1 and June 30, 2011 to support the Foundation for Defense of Democracies in the first-ever study of what radical Saudi Wahhabists are preaching to their followers about the United States and non-Muslims on social media sites.

    Titled “Facebook Fatwa,” authored by FDD vice president for research Jonathan Schanzer and FDD research associate Steve Miller, FDD applied ConStrat's military-grade software to cull Arabic and English language data from Facebook, Twitter, YouTube, blogs, forum, message boards, wikis, and RSS feeds.

    On the basis of the report, FDD recommends that the U.S. intelligence community pay greater attention to social networks, which are one of the few outlets in which Saudis speak their minds with relative freedom. ConStrat project lead was Younes Safaa, Director of Research.

    Other creators
    See project

Recommendations received

More activity by Josh

View Josh’s full profile

  • See who you know in common
  • Get introduced
  • Contact Josh directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses