Jailbreaking DeepSeek: A New Frontier in AI Risk

Sept. 24, 2025

As AI models become more powerful and accessible, so do the risks associated with their misuse.

One of the most pressing concerns in the cybersecurity community is AI jailbreaks, a method of bypassing built-in safety mechanisms to extract unauthorized or harmful outputs. Among the models raising red flags is DeepSeek, a high-performing open-source AI developed in China that's rapidly gaining traction across industries.

For businesses handling sensitive data, using a jailbreak-vulnerable AI like DeepSeek is a huge technical risk. It's potentially a compliance violation, security breach, and reputational disaster waiting to happen.

What Is an AI Jailbreak?

A jailbreak is a technique used to trick an AI model into ignoring its ethical and security constraints. These constraints are typically embedded in the model's system prompt, a hidden set of instructions that governs what the AI can and can't say or do. Jailbreaking allows users to bypass these filters and elicit responses that would normally be blocked, such as instructions for illegal activities, hate speech, or misinformation.

In practice, jailbreaking can look like this: A seemingly innocent query about "creative writing for a fictional scenario" becomes a backdoor to generate code for malware. A request for "hypothetical discussion purposes only" leads to step-by-step instructions for identity theft. Even more concerning, some jailbreaking techniques require minimal technical expertise, which means anyone with malicious intent can potentially exploit these vulnerabilities.

DeepSeek vs. Other Models: The Study That Raised Alarms

In a landmark study published by researchers from Cisco and the University of Pennsylvania, DeepSeek's R1 model was tested against 50 adversarial prompts from the HarmBench dataset. The results were staggering:

100% Attack Success Rate: DeepSeek R1 failed to block a single harmful prompt. It provided instructions for cybercrime, misinformation, chemical weapons, and harassment.
Comparison to OpenAI: OpenAI's o1-preview model blocked 74% of the same prompts, showing significantly stronger resistance.
Architectural Weaknesses: DeepSeek's Mixture-of-Experts (MoE) design led to inconsistent refusal behavior, routing adversarial prompts to under-aligned modules.

The study concluded that DeepSeek's cost-efficient training methods, while innovative, may have compromised its safety mechanisms. The model's reliance on reinforcement learning and distillation techniques, without robust guardrails, makes it highly susceptible to misuse.

"Using algorithmic jailbreaking techniques, our team applied an automated attack methodology on DeepSeek R1 which tested it against 50 random prompts from the HarmBench dataset," according to Cisco. "These covered six categories of harmful behaviors including cybercrime, misinformation, illegal activities, and general harm. DeepSeek R1 exhibited a 100% attack success rate, meaning it failed to block a single harmful prompt. This contrasts starkly with other leading models, which demonstrated at least partial resistance."

DeepSeek Vulnerabilities

DeepSeek R1 and V3 are freely downloadable, allowing users to strip away external safety filters.
Security firms like Palo Alto Networks and CalypsoAI achieved jailbreaks faster on DeepSeek than on models from OpenAI, Google, or Anthropic.
While DeepSeek sometimes refused harmful requests, simple rephrasing or indirect prompts bypassed these protections.

Why Your Business Should Care

The implications of DeepSeek's vulnerabilities extend far beyond theoretical security concerns:

Data Exfiltration: Employees could inadvertently leak proprietary information through jailbroken AI models that don't respect data handling protocols.
Social Engineering Amplification: Bad actors can use jailbroken models to generate highly convincing phishing content customized to your company's communication style.
Supply Chain Risk: Vendors using vulnerable AI could compromise your data even if your internal policies are sound.
Legal Liability: If client information is processed through insecure AI channels, your firm could face regulatory penalties and lawsuits.

Consider this scenario: An employee uses DeepSeek to summarize sensitive client communications because it's faster than approved tools. The model's weak safeguards allow this confidential information to be incorporated into future responses to unrelated queries from different users. This could damage client trust and open you to regulatory scrutiny.

Block DeepSeek

At STACK Cybersecurity, we recommend never using DeepSeek. We have blocked it internally and advised several of our clients to do the same, based on the following concerns:

DeepSeek is a product of China, and its data handling practices raise serious concerns about privacy, surveillance, and compliance.
Companies using DeepSeek risk data leaks, reputational damage, and exposure to unfiltered, harmful content.
DeepSeek's terms of service require dispute resolution under Chinese law and allow indefinite data retention, violating many Western privacy standards.

Final Thoughts

Jailbreaking DeepSeek is a wake-up call that highlights the growing gap between AI capabilities and AI safety. As these technologies become embedded in every facet of business, security must evolve accordingly. The companies that thrive will be those that approach AI with both innovation and caution.

STACK Cybersecurity stands ready to help your firm navigate this new frontier with confidence, clarity, and control. Contact our team today to schedule an AI security assessment and ensure your business isn't exposed to these emerging risks.