Networks and Knowledge
In your final digital tools assignment, you’ll try your hand at network analysis, using an open-access software called Cytoscape to visualize correspondence networks within the seventeenth- and eighteenth-century Republic of Letters. For this assignment, we’ll scrape data from JSON files generated in web searches of Early Modern Letters Online to generate the structured data for network analysis. Your network analysis, generated in the open-access software Cytoscape, will reveal the role that women played as nodes within far-reaching correspondence networks.
This assignment has a lot of steps, I know, but we will work together on steps 1 through 3 in class on Thursday, April 2; step 4 and 5 in class on April 9; and steps 6, 7, and 8 in class on April 16. This assignment is due on GitHub by Friday, April 17 at 11:59 pm.
Step 1: Collect your data
In class on Thursday, April 2, we will work together to use a short Python script to generate .csv files of correspondence data from the JSON file generated by our searches of EMLO.
- Open Google Chrome browser on your computer (download it if you don’t have it).
- Navigate to the Women’s Correspondence Collection at EMLO. This link brings up all 27,847 instances of women senders or recipients of letters in their database.
- That’s way too much data to work with, so we’re going to decide how we want to refine it. Look at the filter options in the left-hand menu. Do you want to pick a single city of origin to work with? A single year of significance? Decide what research question you want to answer by filtering your data. For example, if I wanted to know the extent to which Parisian women were central to networks of correspondence, I would select the Paris, Ile-de-France option within Origin of Letter.
- Selecting Paris narrows my results down to 615, but that’s still too many to work with. But I can see that this database runs to the mid-nineteenth century, and I only want to see letters sent prior to the French Revolution, so I’ll need to Modify Search to limit the years. I can do this by clicking Dates in the left-hand menu and setting parameters; in this case, I choose to start in 1739 and end in 1789. Now I’ve got 403 results of letters either sent by women, received by women, or mentioning women.
- But since I want to perform a network analysis that illustrates the role of gender in the Republic of Letters, I’m going to need to sort my results by gender so that I can more easily label them in my Node List (you’ll know what this is in a second.) So again, I can Modify Search and in the dropdown menu All People, select all. Then under Senders, I select Female and under Recipients I select Male. Now I know that these results will return only women letters writers and male recipients. Now I’ve got 231 results.
Step 2: Scrape that data from the web
- Once your results have loaded, you’re going to open the View meny at the top of your Chrome browser, select Developer, and Inspect Elements. This will bring up a new interface, with your loaded web page off to the left. In the menu at the top, select Network.
- Once you’ve got the Network window open, hit refresh in your Chrome browser. The developer console will now return a list of the network activity required to load the page. Scroll down this list, paying attention to what is listed in the type column. Click the name of the source that is labeled xhr. This will open another window showing you the Request URL that was generated through your query of the backend database. Copy the whole thing, and paste it into a new browser tab.
- What you see is the JSON (JavaScript Object Notation) file containing all of the structured data generated from your search. It’s helpful to check the Pretty-print box at the top left so you can see what you’re dealing with. We’re going to use a Python script to turn this structured data into a more easily readable .csv file. But first, we need to make one minor adjustment to the URL.
- Go back to your first browser window with the developer console open. On the left, where the web page is loaded, you’ll see the number of results generated by your query. Make note of that number, and then return to the browser tab with the JSON file. The URL for this file is super long with lots of numbers and signs, but if you look closely, you’ll see that it contains the parameters of your search. Somewhere in that URL, shortly after the parameters of your search are spelled out, you’ll see 0&rows=50. This tells the database to return just 50 rows of results on this page, so that means you’ve only got 50 results in this JSON file. You’ll need to change the 50 in 0&rows=50 to whatever number of results were generated in your search. In my case, it was 231, so I’ll edit that one section of the URL to 0&rows=231 and then hit enter to reload the data.
- Once the page is reloaded, copy all the text that appears (Command+A and Command+C/Control+A and Control+C). Open a text editor of your choice and paste all of the text into a new file. Save the file as lastname_women_to_men.json (because we queried women senders and male recipients).
Repeat these steps another two times:
- Return to the browser tab with your EMLO results, close the developer console (click the X at the top right). Select Modify your search, and change Recipient from Male to Female. Repeat all of *8Step 2** above, naming the new .json file you generate lastname_women_to_women.json.
- Return to the browser tab with your EMLO results, click Modify your search, and change Sender from Female to Male. Repeat this whole section, naming the new .json file you generate lastname_men_to_women.json.
Step 3: Turn your JSON files into CSV files
- You should now have three JSON files on your computer, which we’ll need to turn into CSV files for importing into Cytoscape. First, we’ll need to upload these new JSON files into the spring-2026-data repository. As is our protocol, create a new branch named lastname in that repository, navigate to the emlo folder, and then click Add file to upload your three JSON files.
- You’ll see another file in that folder, json_csv_emlo.py, which is the Python script to transform the JSON files into CSV. To run that script, you’ll need to open up a GitHub Codespace as we did for the Digital Tools 2 assignment. (Click the green Code button, then Codespaces, then create a new Codespace in your branch.)
- Once the Codespace is open, you may be prompted to download Python for the virtual developer. Go ahead and do so, and then open up the json_csv_emlo.py file in the emlo folder where you’ve uploaded your JSON files. (The file icon in the vertical menu at the far left will reveal the repository structure.)
- In that file, you’ll need to edit the existing Python code to read your specific JSON files and output new CSV files. In line 5 of the code,
with open('filename.son', 'r') as f:you’ll need to change'filename.json'to the name of your first JSON file, so the line readswith open('lastname_women_to_women.json', 'r') as f:. - In line 48 of the code, which reads
with open('filename.csv', 'w', newline='', encoding='utf-8') as f:, you’ll similarly change'filename.csv'to'lastname_women_to_women.csv'so that the line readswith open('lastname_women_to_women.csv', 'w', newline='', encoding='utf-8') as f:. - Finally, open the terminal in your Codespace (click the rectangle in the top right that’s split horizontally). The first line of the terminal should read
@yourusername -> /workspaces/spring-2026-data (lastname) $. To run the Python script in the json_csv_emlo.py file you’ve just edited, type the following command in the terminal:python3 json_csv_emlo.pyand hit enter. - You should see a .csv file with your naming convention appear in the emlo folder of your branch.
- Repeat these steps two more times, each time adjusting the filename of the JSON and CSV files in the Python script (
lastname_women_to_men.json/lastname_women_to_men.csvandlastname_men_to_women.json/lastname_men_to_women.csv). Two more .csv files should appear in the repository. - Select the network icon in the vertical menu at the far left to bring up the commit menu. Stage your changed files, write a commit message, and click Commit.
- Close out of your Codespace when you’re done, and then in the GitHub repository, click the green Commit button to stop or delete the Codespace.
- In the emlo folder of your branch, you’ll see the new .csv files you created. Click each new .csv filename, and once it’s opened, click the three dots at the top right. In the menu that appears, select Download. You’ll have all three files on your computer now.
- Once you’ve got the files you need, you can delete the branch you created.
Step 4: Format your CSV files
- First, we’ll want to combine all our csv files into one giant spreadsheet, but as we combine them, we’ll also want to add in the gender information that will be necessary for our network analysis. So, when let’s open women_to_women.csv first. Delete the first Description column as well as the Location column, then add a column after both Author and Recipient labeled Gender. Since this is the spreadsheet of women writing to women, we know that all individuals are women, so we can copy-and-paste Female next to all entries in each column.
- Now open women_to_men.csv and do the same thing as in the previous step, but in this case you’ll label all Author entries as Female and all Recipients as Male. Repeat with men_to_women.csv, labeling the Authors Male and the Recipients Female.
- Finally, copy and paste all of the data from two of the .csv files underneath the data in the first. It doesn’t matter which you copy into which, so long as all the data ends up in one spreadsheet, with gender labeled appropriately. Save this massive spreadsheet as a .csv with a new filename that describes your network.
Step 5: Use OpenRefine to edit your data
Now that you’ve got one big spreadsheet on your computer, you’ll need to ensure the entries are cleaned up for use in the network. First and foremost, we’ll use OpenRefine to get rid of all the strange unicode characters in our data.
- Once you’ve opened the OpenRefine application on your computer, upload the new, large .csv file you’ve just created. OpenRefine will load the data and ask you to confirm the file type. Click Next and then Create project to bring up the OpenRefine interface. To view all your data at once, you may want to adjust how many rows you see at one time.
- We can go ahead and remove a few columns from this document. To do so, click the down arrow next to the column header, select Edit Column and then Remove this column. Go ahead and remove Date, Location, Origin, and Destination.
- Now, you probably have hundreds of rows of data, and lots of names with weird characters. To edit each name individually would be tedious, and would probably introduce errors into your data. So we’ll use OpenRefine to edit them as a group. Click the down arrow next to the Author column and select Facet and then Text facet. This will bring up a list in the left-had menu of all the distinct names in your Author column.
- As you scroll through this list, you may notice a name with an odd character or two. These are here to replace a letter with an accent or other diacritical marks. You’ll want to Google the names of those with odd characters. You’ll likely turn up the correct spelling. When you find the right spelling, click edit to the right of the given name, make the correction, and then click Apply. The change will take effect for every isntance of that name. Do they same for the Recipients column, and you’re done.
- Export the file when you’ve finished and download it as a .xlsx file. Save it somewhere on your computer as edge list.xlsx, which you’ll import into Cytoscape in the next step.
- Now open the edgelist.xlsx file that you just downloaded from OpenRefine. Transpose the text in the Recipient and Gender columns directly underneath the Author and Gender columns, so that all names are in column 1 and all gender attributes are in column 2. Delete the Year column. Rename the column with names People. Now save this file as nodelist.xlsx.
- Now, open nodelist.xlsx and sort your data so that the People column is ordered alphabetically. You should now be able to see all the duplicates in this spreadsheet–and there will be a lot! Remove all duplicates so that each row has a unique name coupled with a gender attribute.
- Reopen your edgelist.xlsx file and delete the two Gender columns, now that you’ve gotten your node list finished.
Step 6: Import your data into Cytoscape
From this point on, you’ll be on your own to structure the data appropriately. You may want to review Miriam Posner’s Cytoscape tutorial, which defines key terms that you will encounter as you navigate the platform.
- Open Cytoscape and import your edgelist into the program by clicking the icon at the top to Import Network from File System.
- Once you select your file (or drag and drop it) a pop-up window will appear displaying the data from your edge list. Clicking the title of each column will bring up a menu that allows you to describe each column of data. The green circle indicates “source”; the red bullseye is “target”; the blue, green, and red file icons represent “edge attribute,” “source node attribute,” and “target node attribute,” respectively.
- In our network, the letter Author is the source; the letter Recipient is the target; and the Year is an edge attribute because it describes the connection between source and target.
- Finally, you’ll need to add your node list to your network. Click the icon at the top menu to Import Table from file. Drag and drop your file and a different pop-up window will appear. You’ll need to be sure you’ve designated this table as a Node Table Columns.
- Next, click on the title of each column to indicate which column is the key (People) and which is the attribute (Gender).
- Finally, click the Tools menu and select Analyze Network. Check the box to indicate that this is a Directed network.
Step 7: Format your network analysis
- Select the Layout option in the to menu, and play around to see which layout best displays the data in your network.
- Use the style pane on the far left to change the appearance of your network. You can pick among the available styles by clicking the dropdown menu at the top, and then customize further by editing the fill color, labels, shape, and size of the nodes.
- Edit the fill color of nodes according to the gender of the person. Next, edit the size of each node according to EdgeCount.
- Finally, depending on the network you’ve got, you may want to Filter your network to see correspondence by year (choose Column filter and then scroll down to Edge:Year). Be sure to select the show option at the bottom to see how different years affect your network.
Step 8: Generate the JavaScript files to display your network on the web
- Save your Session somewhere on your computer.
- Click File then Export and select Network to Web Page.
- In the pop-up menu that appears, select Network and Style JSON files only (No HTML) from the dropdown menu. Make note of the filepath where your files will be saved on your computer, changing it if necessary. Then click Ok.
- Navigate to the .zip file downloaded from Cytoscape and double click to unzip it. The resulting folder will contain a networks.js file and a styles.js file. Rename each of those files lastname_networks.js and last_namestyles.js.
- Open the spring-2026-data repository and navigate to the networks folder. Upload your lastname_networks.js and last_namestyles.js files. Commit your changes.
Step 9: Embed your network visualization in your post
- It’s now time to create your Markdown file to draft your post. Use our usual workflow: create a new branch named lastname-dt5 in our spring-2026 repository, navigate to _posts, add a new file with the appropriate file naming conventions and YAML header. Note: Do not name your file Digital Tools 5. We can’t have multiple files merged with the same filename. They’ll overwrite one another once I merge your post into the course repo. Think of something more creative!
- Go ahead and paste the following code snippet into the body of the Markdown file:
<html>
<div id="cy"></div>
<script src="https://technologies-of-history.github.io/spring-2026-data/networks/lastname_networks.js"></script>
<script src="https://technologies-of-history.github.io/spring-2026-data/networks/lastname_styles.js"></script>
<script src="https://technologies-of-history.github.io/spring-2026-data/scripts/vendor.js"></script>
<script src="https://technologies-of-history.github.io/spring-2026-data/scripts/main.js"></script>
</html>
You’ll need to update the first and second lines of code so that lastname_networks.js reflects the actual name of your two JavaScript files. Once you do, you’re welcome to check that your JavaScript is loading correctly by opening a Codespace and serving the spring-2026 repository virtually. Find the instructions for using GitHub codespaces here. If all has gone well, your network should look something like this:
A Parisian Republic of Women
Step 10: Explain Yourself
Once you’ve got your JavaScript in your post, please compose a 4-5 paragraph post responding to the following questions:
- How did you select the dataset that you did? What sort of research question might this dataset answer?
- What does this network visualization reveal about the role of women within the early modern Republic of Letters?
- How might visualizing the relationships between individuals across space and time help us to understand the role of letter writing in the formation of intellectual and religious communities in early modern Europe?
Finish this assignment by committing your changes, submitting a pull request from your lastnate-dt5 branch to the master branch.