Structured data: What Google needs to see

The markup is invisible to visitors. It sits in the document head, a block of code no browser renders and no user ever sees. Most developers never write it. Google reads it on every crawl.

Without it, Google guesses. It reads the page, infers what the content is, and is right often enough. But guesses are slower to index, harder to surface, and easier to get wrong. Structured data removes the guessing. It tells Google directly what a page contains. An article. A product. A set of questions and answers.

It does not move rankings directly. What it does is make a page eligible for rich results. Star ratings under a listing. An FAQ that expands in place. A breadcrumb path instead of a raw URL. Rich results take up more space and draw more clicks. More clicks at the same position beats a higher position nobody notices. This is one of three layers most builds never check.

How it is written

Structured data uses a format called JSON-LD. A single <script type="application/ld+json"> tag in the document head holds a block of JSON describing the page. It stays separate from the HTML. The markup never touches the content the user sees, which means it never breaks the layout and never gets tangled in the template.

A minimal example for a blog post:

<script type="application/ld+json">
{
  "@context": "https://schema.org",
  "@type": "Article",
  "headline": "Structured data: What Google needs to see",
  "datePublished": "2026-06-02T00:00:00Z",
  "author": {
    "@type": "Person",
    "name": "Nitish Kumar",
    "url": "https://www.nitish.pro/"
  }
}
</script>

The @context points at the Schema.org vocabulary. The @type declares what the page is. Everything after is the data Google reads for that type.

Where the format is strict

This is where implementations break without warning. The JSON is valid. The page renders. Google reads the block and either discards it silently or, when the markup misrepresents the content, penalises the page.

Dates must be full ISO 8601. 2026-05-27T00:00:00Z is accepted. 2026-05-27 is not enough for many types. A date field that expects a timestamp and gets a bare date is treated as missing.
Author is an object, not a string. "author": "Nitish Kumar" looks reasonable and is wrong. Google expects { "@type": "Person", "name": "...", "url": "..." }. A plain string produces no author in the result.
Required properties vary by type. Each type has its own required fields. Miss one and the markup is valid JSON that produces no rich result. Nothing tells you which field was missing unless you test it.
The URL must match the canonical. A URL inside the markup that points somewhere other than the canonical version of the page sends Google a contradictory signal. The two have to agree.
Markup that misrepresents the content. FAQ markup on a page with no questions. Ratings with no reviews behind them. This is not ignored. Google treats it as manipulation and penalises the page.

Types worth knowing

Schema.org defines hundreds of types. A handful cover almost everything a normal site needs.

Article. Blog posts and editorial content. Core properties are headline, author, datePublished, and image. The baseline for any writing.
BreadcrumbList. The navigation hierarchy of a page. It appears in search as a readable path instead of a raw URL, which makes the listing clearer about where the page sits on the site.
FAQPage. The highest visible impact of any type. Questions expand directly inside the search result, taking up vertical space that pushes competitors down. Only valid on pages that genuinely answer questions.
Product with AggregateRating. Star ratings shown under a product listing. The rating must come from real review data. Inventing it is a violation Google penalises.
Person or Organization. Placed on the homepage to establish publisher identity. It tells Google who is behind the site and ties the rest of the pages to a single entity.

How to check it

Three tools cover the full path from structural correctness to what Google actually accepted.

Schema Markup Validator. The validator checks that the markup is structurally correct against the Schema.org vocabulary. It catches malformed JSON and invalid property names.
Rich Results Test. The Rich Results Test checks eligibility for rich results specifically. A page can pass the validator and fail this, because structural correctness is not the same as rich-result eligibility.
Search Console Enhancements. Google Search Console shows what Google validated after actually crawling the page. It is the only source that reflects the live index rather than a test run.

Test order matters. The validator confirms the markup is well formed. The Rich Results Test confirms it qualifies. Search Console confirms Google accepted it in production.

What the build process will not tell you

No framework generates structured data by default. Next.js does not. Astro does not. The template handles the content. The structured data has to be written deliberately, per page type, by someone who knows it needs to exist.

No test fails because the author is a string. No warning fires because a date is missing its timestamp. The page deploys, renders correctly, and the markup either works or quietly does nothing. The validator catches what is wrong. The Rich Results Test shows what is missing. Neither runs unless you open it. None of it appears in a build log.