When working with LLMs, an extremely valuable use case is generating structured data as the output response of the AI prompt. Whether you’re building an app, prototyping a data pipeline, or performing data extraction or transformation, receiving structured outputs like JSON saves time and makes downstream processing seamless.

But there’s a challenge: these models are optimized for natural language—not strict formatting. Without careful prompting, you might get outputs that contain JSON or event look like JSON but ultimately fail to parse correctly. To avoid that, you need to guide the model and instruct it to the output you require.

This article walks you through prompt engineering strategies for writing AI prompts that reliably produce valid JSON responses, with examples and sample code.


1. Be Explicit About JSON Output

The most basic and important instruction you can give is:

Response with valid JSON only. Do not include any explanation or extra text.

Generative AI models are trained to respond conversationally by default. This instruction helps shift the tone and format to strictly structured output. Keep the prompt short and direct to minimize risk of formatting drift.


2. Tell the LLM What You Want

Before the LLM can output JSON data of what you’re looking for, you will need to instruct the AI what you want it to do.

Here’s a simple example of a prompt that tells the LLM what we want:

Generate a list of 3 fictional users

3. Include an Example or Schema

You will need to tell the LLM what the output response should look like and how it should be formatted. Specifying JSON is great, but you likely require a very specific schema. You can either explain the schema you require, or give the LLM an example JSON to show it was you need.

Schema-style

Explaining the schema you require in the JSON output is one method of telling the LLM how to format the data:

Each item should have: name (string), age (number), signup_date (date)

Example-style Prompt

A method that will likely increase the accuracy and reliability of the LLM to output the JSON schema you need is to explicitly give it an example of the JSON you want it to output:

Output using the following JSON format:

[
  {
    "name": "Steve Johnson",
    "age": 43
    "signup_date": "2025-01-01"
  }
]

Models are very good at copying structure—give it something to copy.


4. Avoid Overcomplication in Prompts

Maintaining clarity is key and being explicit helps. Avoid vague instructions or extra requirements that could lead to inconsistencies.

Here’s an example of a prompt that might confuse the LLM:

Write a list of products in JSON format. Each should have a name, age, and signup_date.
Also make sure the prices are realistic, and don't forget to include at least one out-of-stock item.

Instead, you can see the following prompt is much clearer:

Write a list of products, each should have name (string), age (number), signup_date (date)

Simple, direct prompts will generally yield better-structured responses.


5. Use System Prompt Instructions (If Available)

If you’re using an API like OpenAI’s Chat API or tools like LangChain, be sure to take advantage of the system prompt This can be used to instruct the LLM how it should behave and reinforce the expected behavior:

{"role": "system", "content": "You are a JSON generator. Always output valid JSON without explanations."}

This reduces the risk of the model slipping into natural language commentary in the response.


6. Prepare for Errors

Even well-prompted models sometimes return extra text, incorrect brackets, or malformed syntax. Build safeguards into your workflow:

  • Validate the output using a parser like json.loads() in Python
  • Use temperature=0 for consistent and deterministic formatting
  • Post-process if necessary to strip markdown artifacts or retry

A clean-up and validation step ensures your pipeline doesn’t break.


Full Example: Prompting the LLM and Saving JSON with Python

Here’s a working Python example that:

  • Sends a prompt to Azure OpenAI using langchain-openai
  • Retrieves a response
  • Cleans and parses the JSON
  • Saves it to a .json file
import os
import json
from dotenv import load_dotenv
from langchain_openai import AzureChatOpenAI
from langchain_core.messages import SystemMessage, HumanMessage

# Load environment variables
load_dotenv()

# Set up the Azure OpenAI chat model
chat = AzureChatOpenAI(
    azure_deployment=os.getenv("AZURE_OPENAI_DEPLOYMENT"),
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_version=os.getenv("AZURE_OPENAI_API_VERSION"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY")
)

# System prompt to guide the model
system_prompt = """
You are a JSON generator. Your task is to generate valid JSON data based on the provided prompt.
"""

# Your JSON-focused prompt
user_prompt = """
Generate a list of 3 fictional users.

Here's an example of the JSON format you should use:

[
    {
        "name": "string",
        "age": number,
        "email": "string",
        "signup_date": "YYYY-MM-DD"
    }
]
"""

# Call the chat model directly
response = chat.invoke([
    SystemMessage(content=system_prompt),
    HumanMessage(content=user_prompt)
])

# Invoke the chat model
#response = chat.invoke(prompt)

# Get the response text
response_text = response.content

print("\nRaw Response:\n", response_text)

# Clean up response (remove code block wrappers if present)
response_text = response_text.strip().replace("```json", "").replace("```", "").strip()

print("\n\nCleaned Response JSON:\n", response_text)

# Parse and save JSON
data = json.loads(response_text)

# Save to file
os.makedirs("output", exist_ok=True)
with open("output/users.json", "w") as f:
    json.dump(data, f, indent=4)

print("Saved JSON to output/users.json")

Explanation

  • System & User Prompts: The model is guided with both a system-level instruction to behave like a JSON generator, and a user prompt that includes both instructions and an example.
  • Example Format: Including a sample JSON block in the user prompt helps the model replicate the correct structure.
  • Message Format: This example uses SystemMessage and HumanMessage from LangChain to structure the interaction clearly.
  • Raw vs Cleaned Output: Prints the raw model output before and after removing markdown formatting which is commonly added by the LLM.
  • Validation: Uses json.loads() to ensure the cleaned string is valid JSON.
  • File Output: Saves the JSON to an output directory, which is created if it doesn’t exist.

This approach mirrors how you’d use structured prompts and system guidance in production settings. It’s flexible, clear, and easy to expand for more complex workflows like multi-turn dialogs, pipelines, or evaluation tools.

Here’s an example of the JSON file that is saved at the end of this code:

[
    {
        "name": "Alice Evergreen",
        "age": 28,
        "email": "alice.evergreen@example.com",
        "signup_date": "2023-02-15"
    },
    {
        "name": "Michael Stone",
        "age": 35,
        "email": "michael.stone@example.com",
        "signup_date": "2022-11-08"
    },
    {
        "name": "Sofia Bright",
        "age": 22,
        "email": "sofia.bright@example.com",
        "signup_date": "2023-08-21"
    }
]

Conclusion

Prompt engineering techniques is important when working with LLMs. This is especially true when using the LLM to produce predictable structured data that is valid JSON. This is a powerful technique that unlocks automation, data processing, and tool building automation capabilities using LLMs. With well-structured prompts and a few best practices, you can go from free-form text generation to clean, parsable, and ready-to-use structured data.

Think of JSON prompting as a bridge between natural language creativity and structured logic—master it, and you’ll get the best of both worlds.

Chris Pietschmann is a Microsoft MVP, HashiCorp Ambassador, and Microsoft Certified Trainer (MCT) with 20+ years of experience designing and building Cloud & Enterprise systems. He has worked with companies of all sizes from startups to large enterprises. He has a passion for technology and sharing what he learns with others to help enable them to learn faster and be more productive.
Microsoft MVP HashiCorp Ambassador

Discover more from Build5Nines

Subscribe now to keep reading and get access to the full archive.

Continue reading