J.A.R.V.5

J.A.R.V.5
SymbolStands for
JJunction
AArtificial
RResponsibility
VVersatile
5To be Better than J.A.R.V.I.S

Introduction

J.A.R.V.5 is an advanced conversational robot designed for aged care. It offers comprehensive health monitoring and emotional support by integrating local speech-to-text, emotion detection, Retrieval Augmented Generation (RAG), Large Language Model (LLM), and text-to-speech technologies to effectively interact with users.

Problem Statement

Australia’s aging population is rapidly growing, presenting significant challenges in providing high-quality aged care due to a shortage of skilled caregivers. This scarcity of resources significantly degrades the quality of care provided, particularly in areas requiring extensive resources such as emotional support and health monitoring, which are costly and difficult to maintain at high standards. As these trends are expected to intensify, there is an urgent need for innovative solutions that can deliver high-quality care without solely relying on human resources.

J.A.R.V.5 directly addresses this urgent need by delivering continuous emotional support and effective daily health monitoring. This solution leverages advanced AI technologies to provide in-context, conversational engagement, significantly reducing feelings of loneliness among the elderly. By automating routine interactions and health assessments, J.A.R.V.5 ensures that each resident receives timely and personalized care. This reduces the need for human labor and enhances the quality of care, positioning us well to meet the demands of an aging society in Australia.

J.A.R.V.5 integrates smoothly with existing aged care infrastructure and is ready for direct end-user sales. The current prototype offers context-aware responses, but faces latency issues due to hardware limitations and model size. Addressing this will allow for rapid deployment.

So the main features we provide are continuous:

  • Emotion Support
  • Health Monitoring

Solution Design

Our solution can integrate with Closed Source LLM, for example the OpenAI GPT-4, or Public available LLMs, which includes llama2 or llama3. Our current solution and implementation makes all the component works locally to address the privacy concerns. This will mean that the whole solution will run locally and the data will never leave the user’s premise.

Our solution leverages several state-of-the-art AI technologies to enhance the quality of aged care services:

  • Speech2Text: We employ the Whisper model, renowned for its accuracy and speed in transforming spoken language into text, enabling real-time communication.
  • Multi-Modal Emotion Detection: Developed in-house, this technology integrates audio, text, and video data to accurately detect users’ emotions.
  • RAG and LLM: We use Retrieval Augmented Generation to enhance prompts for our Large Language Models (LLM) with relevant data from our store, ensuring responses are contextually appropriate for users.
  • Text2Speech: The final responses generated by the LLM are converted into natural-sounding audio and played back to the user, closing the communication loop with clarity and ease.

Our primary technical challenge is reducing latency to ensure seamless and immediate interaction, which is vital for user engagement and effective support. We are actively addressing this through hardware optimization, model compression and system architecture to boost our system’s responsiveness and efficiency.

Additionally, we are focused on ensuring that responses are always appropriate and free from offensive content. To achieve this, we are implementing multi-agent secure models to ensure all communications are respectful and safe.

All hardware devices we required are in this image

Demo

UI

If you are interested, feel free to contact us to organise and demo or discussion.

    Pascal Sun Avatar