Truly Reproducible Reporting

Stata's commands for report generation allow you to create complete Word®, Excel®, PDF, and HTML documents that include formatted text, as well as summary statistics, regression results, and graphs produced by Stata.

Stata's commands for creating reports come in two varieties:

  1. Dynamic document commands
    • These commands create text files, HTML files, and Word documents that incorporate the full output from Stata commands. You can use the Markdown text-formatting language to customize the look of your report.
  2. put* commands — putdocx, putpdf, and putexcel
    • These commands create Word documents, PDFs, and Excel files that insert results from Stata commands into formatted text and tables in your document.

Whether you choose the dynamic document commands or the put* commands, you can create documents that are

  • Reproducible
    • Stata makes reproducible research easy. For instance, use the version 16 command, and any commands you run today will produce the same results 10, 20, or more years from now. With the datasignature command, you can verify that your data have not changed. By incorporating these tools for reproducibility into the do-files or text files that create your reports, those reports are also reproducible. Rerun your commands at any time and re-create your report.
  • Dynamic
    • Perhaps you need to run the same report monthly, updating the results based on new data. Simply rerun the commands that created the report with the updated dataset. All Stata results in the report are updated automatically.

Stata 16, makes it even easier for you to generate your reports, specifically:

  • The dyndoc and markdown commands now create Word documents in addition to the HTML documents they previously created. Now you can easily incorporate full Stata output and graphs with Markdown-formatted text to create customized Word documents.
  • The Do-file Editor now provides syntax highlighting for Markdown language elements.
  • The putdocx command now lets you include headers, footers, and page numbers. It also makes it easier to write large blocks of text.
  • The html2docx command converts HTML documents, including CSS, to Word documents.
  • The docx2pdf command converts Word documents to PDFs.

All of Stata's new and previously existing reporting features are now documented in a new Stata Reporting Reference Manual. The manual includes many new examples that demonstrate workflows and provide guidance on customizing the Word, PDF, Excel, and HTML documents you create using Stata.

Dynamic document commands

To create a report with one of the dynamic document commands, we first write a text file that includes Stata code and the text to be written in the document.

Use dynamic tags to process Stata code and embed the resulting Stata output in a text file, HTML file, or Word document. We can modify the output by using attributes with the dynamic tags. For example, we can embed the Stata output while suppressing the command. When we combine these dynamic tags with Markdown-formatted text, we can create nicely formatted reports in HTML or Word.

Creating HTML documents

To demonstrate, we create a report on high blood pressure in which we include output from a logistic regression and a graph of the expected probabilities created with marginsplot. Below is our text file with Stata dynamic tags that look something like

<<dd_version: 2>>
  <<dd_include: header.txt>>
 <<dd graph: ...>

  Blood pressure report
  ===============================================================

  We use data from the Second National Health and Nutrition Examination Survey
  to study the incidence of high blood pressure.

  <<dd_do:quietly>>
 webuse nhanes2, clear
  <</dd_do>>

  ##Logistic regression results

  We fit a logistic regression model of high blood pressure on
   weight, age group, and the interaction between age group and sex.
   
  ~~~
  <<dd_do>>
  logistic highbp weight agegrp##sex, nopvalues vsquish
  <</dd_do>>
  ~~~

  ##Interaction plot

  <<dd_do:quietly>>
  margins agegrp#sex
  marginsplot, title(Age Group and Sex Interaction) 
          ytitle(Expected Probability of High Blood Pressure)
          legend(ring(0) bplacement(seast) col(1))
  <</dd_do>>

  <<dd_graph: saving("interaction.png") replace height(400)>>
  

At the top of the text file, we specified the minimum version required to convert the text file. We also included the header.txt file, which contains HTML code used to format the document.

We used equal signs (===) to denote the title of the report and a double pound sign (##) for the section headings. Additionally, we used the <> dynamic tag for Stata commands that we want to execute, but we specified the quietly attribute to suppress the output from webuse and margins.

To generate our report in HTML format, we type:

. dyndoc dyndoc.txt 

This creates the following file, dyndoc.html:

Creating Word documents

We can instead create a Word document from our text file by typing

. dyndoc dyndoc.txt, docx

This creates the following file, dyndoc.docx:

Change the CSS and change the style

Above, dyndoc.html uses a CSS style sheet, stmarkdown.css, to control the look and feel of various HTML elements. This look and feel is also preserved in dyndoc.docx.

We can easily change the style of dyndoc.html–for instance, alignment of headings, code block, and images–by changing stmarkdown.css. After changing stmarkdown.css, simply refresh dyndoc.html in your browser.

To produce the dyndoc.docx with new look, we could run:

. dyndoc dyndoc.txt, docx replace

or we could directly convert our HTML file by typing:

. html2docx dyndoc.html, replace

The put* commands

The putdocx, putpdf, and putexcel suites can be used to create customized reports in Word, PDF, and Excel. These commands allow you to format the layout and content of the tables and text being exported.

Let's begin by creating a Word document.

Creating Word documents

We can create a Word document with the same content as the one we created above by running the commands in the following do-file:

webuse nhanes2, clear

putdocx begin

// Add a title
putdocx paragraph, style(Title) 
putdocx text ("Blood pressure report")

putdocx textblock begin
We use data from the Second National Health and Nutrition Examination Survey
 to study the incidence of high blood pressure.
putdocx textblock end

// Add a heading
putdocx paragraph, style(Heading1)
putdocx text ("Logistic regression results")

putdocx textblock begin
We fit a logistic regression model of high blood pressure on
 weight, age group, and the interaction between age group and sex.
putdocx textblock end

logistic highbp weight agegrp##sex, nopvalues vsquish

// Add the coefficient table from the last estimation command
putdocx table results = etable

putdocx paragraph, style(Heading1)
putdocx text ("Interaction plot")

margins agegrp#sex
marginsplot, title(Age Group and Sex Interaction)
        ytitle(Expected Probability of High Blood Pressure)
	legend(ring(0) bplacement(seast) col(1))
graph export interaction.png, replace
putdocx paragraph, halign(center)

// Add the interaction plot
putdocx image interaction.png

putdocx save report1, replace	

Individual putdocx commands begin the creation of the document, add titles and other text, include estimation results, and add graphs. When we run them, we create report1.docx.



We might want to further customize our document. With additional putdocx commands, we can create a document complete with a header, footer, page numbers, titles, subtitles, and sections with different formatting. Additionally, we can append multiple files and interact Stata's features with Word's features.

For example, to create a modified version of report1.docx with a header, page numbers, and a formatted table of regression results, we created the following do-file:

webuse nhanes2, clear

// Create a document with a header
putdocx begin, header(head)

// Define the header content, and include page numbers
putdocx paragraph, toheader(head) font(,14)
putdocx text ("Blood pressure report: ")
putdocx pagenumber

putdocx paragraph, style(Heading1)
putdocx text ("Data")

putdocx textblock begin
We use data from the Second National Health and Nutrition Examination Survey
 to study the incidence of high blood pressure.
putdocx textblock end

putdocx paragraph, style(Heading1)
putdocx text ("Logistic regression results")

putdocx textblock begin
We fit logistic regression models of high blood pressure on
 weight, age group, and the interaction between age group and sex.
putdocx textblock end

logistic highbp weight agegrp##sex, nopvalues vsquish
putdocx table results = etable

// Add a background color to alternating rows of the table
putdocx table results(3 5 7 9 11 13 15 17,.), shading(lightgray)

// Format the estimation results to have three decimal places
putdocx table results(2/17,2/5), nformat(%5.3f)

putdocx table results(1,1) = ("High BP")

putdocx paragraph, style(Heading1)
putdocx text ("Interaction plot")

margins agegrp#sex
marginsplot, title(Age Group and Sex Interaction) 
        ytitle(Expected Probability of High Blood Pressure)
	legend(ring(0) bplacement(seast) col(1))
graph export interaction.png, replace
putdocx paragraph, halign(center)

// Specify the height of the image to be 5 inches
putdocx image interaction.png, height(5 in)

putdocx save report2, replace	

When we run this do-file, we produce the following Word document:

Creating PDFs

We can create a PDF with the same content as dyndoc.docx. We'll modify our previous docx1.do file, replacing putdocx with putpdf and making other minor edits:

version 16
webuse nhanes2, clear

// Create a PDF document for export 
putpdf begin

// Add a title, and center it horizontally 
putpdf paragraph, font(,20) halign(center)
putpdf text ("Blood pressure report")

putpdf paragraph
putpdf text ("We use data from the Second National Health and Nutrition ")
putpdf text ("Examination Survey to study the incidence of high blood pressure.")

putpdf paragraph, font(,16)
putpdf text ("Logistic regression results")

putpdf paragraph
putpdf text ("We fit a logistic regression model of high blood pressure on ")
putpdf text ("weight, age group, and the interaction between age group and sex.")

logistic highbp weight agegrp##sex, nopvalues vsquish

// Export the table of estimation results 
putpdf table results = etable

// Begin the following content on a new page 
putpdf pagebreak
putpdf paragraph, font(,16)
putpdf text ("Interaction plot")

margins agegrp#sex
marginsplot, title(Age Group and Sex Interaction) 
        ytitle(Expected Probability of High Blood Pressure)
        legend(ring(0) bplacement(seast) col(1))
graph export interaction.png, replace

// Add the interaction plot and center the image 
putpdf paragraph, halign(center)
putpdf image interaction.png

putpdf save report, replace		

When we execute this do-file, we obtain the following document, report.pdf:



You'll note that the putpdf commands are very similar to the putdocx commands; the suite includes commands for exporting text, tables, and images.

With the putpdf suite, you can set the page margins for your document and the margin size for any tables you export. Additionally, you can change the graph sizes and divide your document into sections with different layouts.

Creating Excel files

Finally, we demonstrate how to export Stata results and graphs to Excel files. We can use the do-file below to create a version of our blood pressure report in the Excel format:

version 16 
webuse nhanes2, clear

// Set the report.xlsx workbook for export 
putexcel set report.xlsx, replace 

// Add a title
putexcel A1 = "Blood pressure report", bold
putexcel A1:E1, border(bottom, thick) merge hcenter

putexcel A3 = "We use data from the Second National Health and "
putexcel A4 = "Nutrition Examination Survey to study the "
putexcel A5 = "incidence of high blood pressure." 

// Add a heading for the regression results
putexcel A7 = "Logistic regression results"
putexcel A7:E7, border(bottom, double)

putexcel A9 = "We fit a logistic regression model of high blood"
putexcel A10 = "pressure on weight, age group, and the "
putexcel A11 = "interaction between age group and sex." 

logistic highbp weight agegrp##sex, nopvalues vsquish

// Export the table of estimation results
putexcel A13 = etable

putexcel D13:E13, merge

putexcel G7 = "Interaction plot"
putexcel G7:I7, border(bottom, double)

margins agegrp#sex
marginsplot, title(Age Group and Sex Interaction)
        ytitle(Expected Probability of High Blood Pressure)
	legend(ring(0) bplacement(seast) cols(1))
graph export interaction.png, replace width(550) height(400)

// Export the interaction plot
putexcel G9 = image(interaction.png)
putexcel save	

With the putexcel commands, specify the cells in which our text and results should appear. We also add instructions to format table elements such as cell borders.

After executing the do-file shown above, the following Excel file is produced:



We can further customize this document by formatting the statistics, modifying the column labels, changing the alignment of the font, and more. With a few additional putexcel commands, we can customize our final report completely from within Stata.

In the do-file shown below, we specify font("Arial Narrow", 11) to use a different font type and size, mainly to address the long labels for the interaction terms. We also include the nformat(number_d2) option to format the estimation results to have two decimal places.

webuse nhanes2, clear

version 16
putexcel set report2.xlsx, replace 

putexcel A1 = "Blood pressure report", bold
putexcel A1:E1, border(bottom, thick) merge hcenter

putexcel A3 = "We use data from the Second National Health and "
putexcel A4 = "Nutrition Examination Survey to study the "
putexcel A5 = "incidence of high blood pressure." 

putexcel A7 = "Logistic regression results"
putexcel A7:E7, border(bottom, double)
putexcel A9 = "We fit a logistic regression model of high blood"
putexcel A10 = "pressure on weight, age group, and the "
putexcel A11 = "interaction between age group and sex." 

// Specify the font type and size
putexcel A1:A11, font("Arial Narrow",11)

logistic highbp weight agegrp##sex, nopvalues vsquish

// Export the table of estimation results
putexcel A13 = etable

// Format statistics to have two decimal places
putexcel B14:E29, nformat(number_d2)

putexcel A13:E29, font("Arial Narrow",9)
putexcel A15, left
putexcel A21, left
putexcel A23, left

// Merge columns with the C.I. label.
putexcel D13:E13, merge

putexcel G7 = "Interaction plot", font("Arial Narrow",11)
putexcel G7:I7, border(bottom, double)

margins agegrp#sex
marginsplot, title(Age Group and Sex Interaction)
        ytitle(Expected Probability of High Blood Pressure) 
	legend(ring(0) bplacement(seast) cols(1))
graph export interaction.png, replace width(550) height(400)

// Export the interaction plot 
putexcel G9 = image(interaction.png)

putexcel save

After executing this do-file, we obtain report2.xlsx:



Learn more in the Stata Reporting Reference Manual.

Post your comment

Timberlake Consultants