Job Description
Hippocratic AI has developed a safety-focused Large Language Model (LLM) for healthcare. The company believes that a safe LLM can dramatically improve healthcare accessibility and health outcomes in the world by bringing deep healthcare expertise to every human. No other technology has the potential to have this level of global impact on health.
Hippocratic AI is seeking a skilled Audio Data Engineer to help us scale and improve our speech datasets for use in Text-to-Speech (TTS) and speech synthesis systems. In this role, you will clean and enhance real-world audio data, build automation pipelines for processing, and ensure our voice models are trained on the highest quality inputs. This work will directly shape the clarity and expressiveness of the voices used in healthcare AI applications.
Clean, denoise, and enhance large volumes of recorded speech data for use in TTS and voice synthesis pipelines.
Build and maintain automated audio preprocessing pipelines using scripting tools and open-source libraries.
Apply techniques such as background noise removal, silence trimming, gain normalization, and sample rate conversion.
Integrate tools like ffmpeg, sox, or Python-based scripts (pydub, torchaudio, librosa) into scalable workflows.
Collaborate with ML researchers and speech scientists to deliver high-quality, ready-to-train datasets.
Evaluate audio quality using perceptual and quantitative metrics, and maintain audio QA checklists.
Strong experience with speech/audio cleaning using tools such as iZotope RX, Audacity, Adobe Audition, or SoX.
Proficiency in Python and audio-related scripting for automation and batch processing.
Familiarity with digital audio principles, including sample rates, bit depth, frequency bands, and compression artifacts.
Experience designing or operating scalable, automated workflows for handling audio at volume.
Meticulous attention to detail in audio quality control and error spotting.
Experience working on TTS model pipelines (e.g., Tacotron, VITS, FastSpeech) or speech synthesis datasets.
Background in audio engineering, phonetics, or signal processing.
Familiarity with real-time or low-latency audio processing constraints.
Experience with cloud platforms and tools for automation (e.g., AWS, Airflow, or containerized audio workflows).
Innovative Mission: We are developing a safe, healthcare-focused large language model (LLM) designed to revolutionize health outcomes on a global scale.
Visionary Leadership: Hippocratic AI was co-founded by CEO Munjal Shah, alongside a group of physicians, hospital administrators, healthcare professionals, and artificial intelligence researchers from leading institutions, including El Camino Health, Johns Hopkins, Stanford, Microsoft, Google, and NVIDIA.
Strategic Investors: We have raised a total of $278 million in funding, backed by top investors such as Andreessen Horowitz, General Catalyst, Kleiner Perkins, NVIDIA’s NVentures, Premji Invest, SV Angel, and six health systems.
World-Class Team: Our team is composed of leading experts in healthcare and artificial intelligence, ensuring our technology is safe, effective, and capable of delivering meaningful improvements to healthcare delivery and outcomes.
For more information, visit www.HippocraticAI.com.
Our team values in-person collaboration, with on-site presence expected five days a week in Palo Alto, CA.
Hippocratic AI develops safety-focused large language models with a focus on non-diagnostic, patient-facing applications. The majority of language models undergo pre-training using the common crawl of the Internet, which might contain inaccurate and deceptive information. In contrast, Hippocratic AI is significantly prioritizing the acquisition of evidence-based healthcare content through legal means, setting it apart from these language models.