DeepSeek isn’t done yet with OpenAI – image-maker Janus Pro is gunning for DALL-E 3

DeepSeek hasn’t finished with OpenAI yet – image maker Janus Pro is aiming for DALL-E 3 (19459000)

Barely one week after DeepSeek’s R1 LLM turned Silicon Valley upside down, the Chinese outfit has returned with a new release that it claims is ready for OpenAI’s DALL-E 3 to compete.

Released by Hugging Face Monday amid an ongoing CyberattackJanus Pro 1B & 7B are a multimodal large language model (LLM) family designed to handle image generation and vision-processing tasks. Janus Pro works the same as DALL-E 3. You give it an input prompt, and it will generate a matching image.

These models are said by the Chinese lab to be an improvement over its first 1.3B Janus released in last year. They achieve this by decoupling the visual encoding while maintaining a single processor architecture.

In the research paper [PDF] describing the model and its structure, the brains behind the neural networks noted that the Janus model was promising, but had “suboptimal performance on short prompts, image generation, and unstable text-to-image generation quality.” DeepSeek claims it was able overcome these limitations with Janus Pro by using a larger dataset and aiming for higher parameter counts.

In comparison to a variety multimodal and task optimized models, the startup claims Janus Pro 7B outperforms OpenAI’s DALL E 3 and Stable Diffusion 3 Medium in the GenEval Benchmarks. It’s important to note that image analysis tasks have a limit of 384×384 pixel.

DeepSeek says its Janus Pro image model offers higher performance than OpenAI’s DALLE 3 or Stability AI SD3-Medium…Click to enlarge.

Similar to DeepSeek V3, this model maker claims that it was able achieve these results with only a few hundreds GPUs running the HAILLM framework on PyTorch. The paper explains the process. “whole training process took about 7/14 days on a cluster 16/32 nodes for 1.5B/7B model, each equipped with eight Nvidia A100 (40GB) GPUs.”

Training time may have been reduced by reusing older models instead of training a new model from scratch. We’ve reached DeepSeek to get clarification. DeepSeek acknowledges that, despite being competitive with other multimodal LLMs or diffusion models, there is still work to be done. “In terms of multimodal understanding, the input resolution is limited to 384×384, which affects its performance in fine-grained tasks, such as OCR,” explained the researchers. The researchers also note that the limited resolution results in images with little detail.

You can download the Janus codebase under an MIT License. The Pro models are subject to DeekSeek’s Model License.

DeekSeek offers a pair quick-start scriptson their GitHub page for local testing. You can also check out the demo running in Hugging face Spaces . It took several minutes to load the HuggingFace Demo during our testing.

DeepSeek model releases caused significant reactions in the market, sending Silicon Valley stock pricescrashing on Monday. The US superiority in AI was questioned and the need for billions dollars of infrastructure. It hasn’t all been smooth sailing, however. There have been some hiccups such as censorship issues.

As if that weren’t bad enough, DeepSeek was forced to limit the number of new signups on Monday for its AI chatbot due to an ongoing cyberattack. (r)

www.aiobserver.co

More from this stream

Recomended