¶¶Ňő̽̽

Author

Clayton Cafiero

Published

2025-01-31

Overview

In just a few years, LLMs such as Open AI’s ChatGPT and Anthropic’s Claude, among others (collectively referred to here as “generative AI”) have dramatically altered work, social discourse, and education. They promise (and have in some cases delivered) breakthroughs in a wide variety of domains from drug discovery and material science to medical diagnosis and theorem proving. Clearly these tools can be put to good use to expand human knowledge. However, the same technology has inherent dangers, so we should be cautious about its use.

Who controls these models? Can we trust the output of these models? How can they be used to deceive, manipulate, and disenfranchise? These touch at the very heart of ¶¶Ňő̽̽’s mission and core principles embodied in our new slogan, “For People and Planet”, and Our Common Ground.

Nevertheless, like the petroleum-powered automobile of the early 20th century, generative AI is upon us and it is not going away. Accordingly, it is our duty to find a way forward to put these tools to responsible and productive use.

Generative AI is a disruptive technology like no other. Its digital nature allows it to scale in ways that other technologies cannot. Thus change is rapid, and what was state-of-the-art six months ago is now obsolete. This makes it difficult to establish a uniform, long-term policy, since the landscape can change very quickly.

For example, the recent introduction of DeepSeek has been described as a “Sputnik moment” and its launch precipitated a loss of roughly $1 trillion in market capital (Nasdaq Composite). Nvidia lost nearly $600 billion in market capital in a single day. Clearly there’s a lot in play, and developments in the field are proceeding at a breakneck pace. Reportedly, DeepSeek was developed for only a fraction of the cost of other US-based LLMs. If cost estimates are indeed accurate, DeepSeek shows us that models can be constructed, trained, and deployed quickly and with relatively modest cost. This may alter the landscape significantly, since other startups will no doubt learn from DeepSeek’s example (in fact, Oumi AI, a startup by engineers from Google and Apple, is trying to do just that).

While generative AI has produced impressive results, these models are not without their flaws. “As impressive as they are, state-of-the-art LLMs remain susceptible to brittleness and unhumanlike errors.” (Melanie Mitchell and David C. Krakauer, “The debate over understanding in AI’s large language models.” PNAS, March 2023, ).

It is tempting to use these tools to reduce labor costs. This has yet to play out in the marketplace, but many companies have already sourced low-level coding and engineering work, and other non-technical jobs, to generative AI. Again, this is not without risks. Several companies have encountered unexpected negative results in these efforts and many are backpedaling. While not specifically involving generative AI, we have seen in the recent case of Boeing the dangers of sacrificing human capital and sound engineering principles in the pursuit of near-term gains.

Use of generative AI can subtly shift our focus from process to product. As educators, it is of paramount importance that we not lose sight of the value of process, and the human element in our work. We should be filling students’ brains, not emptying them by outsourcing our thinking.

We must remain circumspect about the role we choose for our AI assistance. Should we use it to generate lesson plans and course content? Or should we use it as a sounding board or advisor? (See: .)

It is only through education, awareness, and commitment to sound ethical principles that we can put these tools to responsible and productive use, in keeping with our mission and in accord with Our Common Ground.

Guidelines for use in teaching and assessment or activity design

Use caution in offloading development of teaching materials or assessments to a generative AI model. Instructors know best how to align materials with target learning outcomes and what’s level-appropriate for the students in your class. That said, these models have their uses.

If you use an LLM to review and comment on an assessment or activity of your own design:

  • give plenty of context so the LLM is more likely to generate level-appropriate comments and suggestions,
  • submit the entire assessment to the LLM (so working with plain-text, some flavor of TeX, or Markdown works best),
  • while many LLMs can read things like Word or PowerPoint documents, they can have occasional difficulty extracting text, so if you use these, it’s best to use structured formatting (e.g., using H1, H2, headings rather than manually adjusting font size and applying boldface),
  • take positive feedback with a grain of salt, and
  • review suggestions carefully and make sure that any revisions you might make based on LLM review are in keeping with the pedagogical objectives of the assessment.

If you use an LLM to generate an assessment or activity:

  • give plenty of context so the LLM is more likely to generate level-appropriate materials,
  • be prepared to make substantive revisions,
  • verify everything the LLM produces—proofread carefully, and
  • cite your source in the assignment (e.g., ChatGPT 4o was used in designing this assessment), but be aware this sends mixed signals to students (i.e., the instructor may use these tools for their work, but the student may not).

Whether using an LLM for review and comment or generation:

  • be aware that LLMs have a stochastic nature, the same prompt issued at two different times might produce different results,
  • while improving at a rapid pace, remember these models remain brittle and error-prone—they can and do hallucinate, and all too often they are confidently incorrect—so verify everything,
  • if you find proofreading and revision of LLM models takes more time than writing on your own, then write on your own, and
  • consider your voice and your role in the process of education—don’t relinquish these to machines for the sake of some convenience.

FERPA

  • Never provide any prompt or data to any LLM or AI-based tool that might include student information. Ever.

  • If you ask an LLM for feedback on student work or for an assessment of the likelihood of plagiarism always ensure that what you provide has all identifying information removed. When in doubt, take it out.

Deploy slowly and with caution

Already there are abundant cases of software developers having to undo and redo work that was AI-generated. There are many articles with titles like “When AI Promises Speed and Delivers Debugging Hell” or “AI is Creating a Generation of Illiterate Programmers.” While many companies are barreling ahead with automation and layoffs, many others have rolled back AI-based initiatives after complaints of “polluted code bases.” Several open source projects (e.g., FreeBSD) have banned AI-generated code because of all the difficulties it creates. Concerns aren’t just limited to coding. For example, the medical profession is raising red flags with regard to AI-assisted medical care (see: ).

The point is, if you adopt such tools, do so in small steps. That way if you get unexpected or disappointing results, the scope of the fix will be limited. Take an experimental approach.

Be aware of the cost of using AI

  • AI consumes a tremendous amount of energy and is already disrupting power grids. Some hope that LLMs will help solve these problems, but for the present, they are creating challenges, and where fossil fuels are used to generate electricity, they have a significant carbon footprint. See, for example:

    • .

  • AI generates a tremendous amount of e-waste. See:

    • )
  • Maintain your humanity. This might sound dramatic, but there’s a real risk of giving up much of what makes us human. See for example:

  • Understand that we do not control these LLMs, others do, and they have agendas. Be on guard for skewed or censored results.

  • Consider your role in abetting those who don’t have our interests, or the interests of the academy and the flourishing of human knowledge at heart.

There is no end to prognostications—some quite dire—about the effect generative AI or AGI may have on humankind. While much of this is speculative it’s still worth considering and monitoring.

Consider how generative AI models have been trained

There is widespread outcry about data harvesting for use in training LLMs—scholars, authors, artists alike claiming their work has been plundered for training data, without consent, remuneration, or attribution. These are serious and all too plausible claims, and they raise important ethical issues for users.

Ask yourself: Is it ethical to use a model which was trained on what amounts to stolen data? Should we use (or pay money for) such models when the authors of plundered content are not acknowledged or compensated?

Summarizing

These models can be quite good at summarizing an article or paper. They can also be used to replace or supplement technical documentation. However, this is no substitute for actually reading the material oneself. Don’t fall into the trap of thinking you know what’s in an article or paper or student’s essay unless you’ve read it yourself.

With regard to students, the ability of these models to summarize documents can be quite helpful—especially at the introductory level. In a recent poll of UK undergraduate students 66% of respondents thought it acceptable to use generative AI for explaining concepts and 53% thought it acceptable to use these tools for summarizing (). But if LLMs become the only source, then students may not learn how to read and summarize an article, paper, or document on their own.

Cognitive off-loading and focus on product over process

These models tend to shift our focus—in ways both obvious and subtle—from process to product. They also tempt us to off-load cognitive tasks. In the near term, this may appear to be a labor saver. However, it does not always help us develop and exercise such skills on our own. This is dangerous enough for experts in the field. It can be disastrous for students who haven’t yet acquired such skills or learned how to produce content on their own.

We should all understand the cost and impact use of generative AI has on ourselves, our students, our institution, and our world, and this understanding should inform our use of these tools, or indeed our choice not to use them.

Some resources

  • Stanford University’s Human-Centered Artificial Intelligence:
  • Oxford Institute for Ethics in AI:
  • Harvard University’s Berkman Klein Center:
  • Institute for Ethical AI & Machine Learning:
  • Ethics and Governance of Artificial Intelligence Initiative:
  • Machine Intelligence Research Institute:

Some tools

  • Anthropic Claude 3.5 Sonnet ()
  • ChatGPT 4o ()

Of these two, I find Claude marginally better than ChatGPT 4o, but certainly results will vary based on the discipline and level of the course.

While DeepSeek has received considerable press, I recommend avoiding using this for the time being, since it’s not clear how private this tool really is.

Other LLMs under investigation include Meta’s Llama ( and ) and Google Gemini ().

I have not yet evaluated InstructGPT, but will be investigating in the future.

I have not yet evaluated CoPilot except in the context of an integrated development environment (IDE) for writing code. However, I will investigate in the future.

While there are many AI-powered tools for grading (for example, CoGrader), most are geared toward grading essays (think: glorified spelling and grammar checkers, with some ability to follow an argument in an essay). Such tools are not discussed here, but I will continue to monitor their rapid development.

Quick tips

Prompt engineering

Constrain the problem

Try to supply constraints wherever possible. Adding constraints can generate more focused responses.

Limit the scope or length of response

As these models tend to be long-winded, adding a constraint on the size of the response can be very helpful.

Provide context

Providing context is crucial for generating level-appropriate output likely to align with course objectives and learning outcomes.

Use one-shot or few-shot learning

One-shot learning or few-shot learning involves providing a specific model for output by way of example. Don’t leave it up to the LLM to make decisions for you. Show the LLM what you want.

OK

Can you give me five exercises that require application of Gauss-Jordan reduction?

Better

Here’s an example of what I’m after.

Solve:

\begin{align*} x + y - 2z &= -2 \\ y + 3z &= 7 \\ x - z &= -1 \\ \end{align*}

Can you solve this demonstrating the application of Gauss-Jordan, and then produce five more problems of similar difficulty and number of variables?

Be prepared to verify everything

Developers of all these models warn that they can and do make mistakes and that output should be reviewed and verified by humans. This problem is compounded by the fact that much of the output of these models—when they are wrong—often seems quite plausible.

The more you generate with these models, the more you have to fact-check them. In many cases, fact-checking and editing model output may take more time than generating content oneself—sometimes far more time.

Be prepared to edit and tidy up output

Output of models may not be arranged, organized or formatted in a way that suits a particular need. Prompt engineering may or may not ameliorate the situation. We are often left with content that must be edited substantially. The more we must revise model output, the less time we save by using such models.

Understand what these models do and do not retain across sessions

A naĂŻve user may reasonably assume that each session (chat) is its own workspace, and that one session cannot affect results of another, but this is not generally the case. As a result, one session might subtly (or otherwise) skew or alter output. If you play the role of student in one session (for experimental purposes) and play the role of educator in another, be on the lookout for pollution or leakage between sessions. These models can retain certain knowledge of prior chat sessions even when these sessions have been deleted.

Next time you log in to Claude or ChatGPT ask it this question: What do you know about me? You might be very surprised at the result.

You can ask these models to “forget” what they know about you or prior queries, but it’s not entirely clear what still might be retained.

Recognize the stochastic nature of these models

These models are, at root, statistical machines. Be aware that the same prompt, provided to the same model at different times, may produce significantly different results.

Compare the output of different models if possible

Because output of these models is stochastic, and because they can and do make mistakes, consider providing the same prompt to two or more models and comparing results. Of course, this takes extra time, but often it is quite illuminating.

Be prepared to cut and run

These models can hallucinate, and they can, and often do, get stuck. Here’s an anecdote (alas, I did not preserve these chats). In experiments, I tried asking ChatGPT and Claude to produce a graph on ten nodes, with weighted edges and heuristics associated with each node, such that the heuristic is admissible and consistent (these are simple numeric constraints). Both models failed by producing graphs for which the constraints were not satisfied. In follow-up prompts, I pointed out that such-and-such a node violated the requested constraint. The response from both models was along the lines of “Yes. Thank you for pointing this out. Here’s an updated graph with the desired properties.” and then they would produce either exactly the same graph with exactly the same defect or a different graph with similar violations of constraints. When this happens, it’s often best to cut and run. You can do battle with the prompts for a longer amount of time than it would take to produce an example on your own.

Don’t lose your voice

LLMs tend to produce generic-sounding text. Even if you include suggestions in prompts (e.g., “Please use a breezy conversational style, peppered with bits of wry humor”) the voice of an LLM is not yours. Don’t erase yourself. Part of what makes an educator effective is being able to form connections with students. LLMs are faceless and anonymous. Don’t rob your students of those connections. Don’t relinquish your voice.

Academic integrity

Faculty

If faculty include the output of generative AI in teaching materials, assessments, and the like, they should be obliged to follow the same guidelines for citing sources as is customary in academic writing, and should be in accord with ¶¶Ňő̽̽ policies for academic integrity. If we expect students to cite sources, including generative AI if used, faculty should do the same in all cases.

But this presents a problem: If we use generative AI to produce teaching materials, what message are we sending to students? If AI generated content can’t be submitted by a student as one’s own work or as evidence of learning, why, then, would it be acceptable for faculty to do so?

Instructors would like to think they’re good at detecting text produced by generative AI. An astute student might be just as good (or better!) at such detection. What message do we send students if we present output of generative AI as our own?

Students pay tuition to have access to faculty, who are expert in their fields and who help students learn. They are not paying tuition to have the institution serve as an intermediary between them and an LLM.

Students

Instructors should develop unambiguous policies regarding acceptable and unacceptable use of generative AI. Students are often unsure and in need of guidance. Students should also be given clear statements about how they will be assessed.

Weighting of assessments

Because students have access to generative AI tools, it may be appropriate to shift the weighting of assessment more toward in-class quizzes, exams, and active learning exercises, and reduce the weight of homework. This may come at a cost. For example, pencil and paper exercises may take longer to grade than digital submissions. In such cases, AI can increase the grading burden taken on by instructors and TAs. However, this is may an acceptable trade-off, since direct assessment without access to digital resources including generative AI is the gold standard for measuring what students really know about a subject.

Some good news; some bad news

In the aforementioned UK survey from the Higher Education Policy Institute (HEPI), only 3% of students thought it acceptable to use AI-generated content in assessments without editing. The bad news is that it’s unclear what students might think is an acceptable amount of editing or revision.

Syllabus language

Address generative AI in your syllabus. Here is a specimen:

“Academic integrity: The Department of Computer Science enforces ¶¶Ňő̽̽’s Code of Academic Integrity. Any suspected violation of this policy will be referred immediately to ¶¶Ňő̽̽’s Center for Student Conduct (/sconduct). Sanctions for a violation may include a grade of XF in the course. Additional violations can result in dismissal from the university. In a word: Don’t. All students should read and understand this policy. See: .

“Collaboration on quizzes and exams is strictly prohibited. Use of online services as a source of solutions is strictly prohibited. Using generative AI such as ChatGPT or Claude, or websites such as Chegg or Course Hero to complete coursework is a form of academic dishonesty. Work you submit for an individual grade must be your own. Any work not produced by you must be cited. For certain assignments, students may collaborate on homework (typically limited to teams of two). If you collaborate with another student on an assignment, be sure to indicate team members as specified. If you have any questions, ask!

“Any attempt to tamper with or defeat any autograder is a form of academic dishonesty. This applies wherever autograders are in use, for example on Brightspace or Gradescope.

“All code submitted by students is subject to code similarity review.

“Exams, quizzes, homework assignments, answer keys and solutions, presentations or lecture notes, specifications and rubrics are copyright protected works, unless clearly and explicitly indicated otherwise. Any unauthorized copying or distribution of protected works is a violation of federal law and may result in disciplinary action. This includes submission of protected works as prompts to generative AI. Sharing of course materials without the specific, express approval of the instructor may be a violation of the University’s Code of Academic Integrity and an act of academic dishonesty, which could result in disciplinary action. Violations will be handled under ¶¶Ňő̽̽’s Intellectual Property Policy and Code of Academic Integrity, as appropriate. See: and .”

Specimens for use on Brightspace

General guidelines

“Any work you submit for grading must be your own.

“In some cases it’s appropriate to cite a collaborator or reference source in the docstring of a file, but anything not produced by you must be cited. That said, it is not the case that anything goes so long as it’s cited. Learning comes from doing, and nowhere is this more true than in learning how to program. Invest the time and effort in producing your own work, and you’ll find that despite being challenging (and occasionally frustrating) programming is fun and rewarding. Take this opportunity to build a solid foundation for future coding and future courses.

“We screen homework submissions for code similarity (and changing variable names won’t prevent highly similar code from being flagged as such). This AI-powered functionality is built in to Gradescope, but your code will also be reviewed by me or a TA.

“I also test tools like ChatGPT or Claude on homework specifications to see what these tools produce.

“All homework should make use of language features that have been presented (so far) in the course. You may not use features that haven’t yet been presented.

“As a general rule, you should be able to explain your code, clearly and simply, line by line. If you can’t do that, then likely there’s a violation taking place.

“The course policy on academic integrity can be found in the syllabus. If you have questions, ask!”

Citing

Again, here is a specimen used in Brightspace.

“In submitted programming assignments, you should include citations within the docstring of any and all file(s) that make use of outside sources or involve collaboration. Here is an example of citing consultation with generative AI (ChatGPT, Claude, etc).

“If you cite generative AI as a reference source, please include the prompt you used. Please see the video Legitimate and illegitimate uses of generative AI for more.

"""
Egbert Porcupine
eporcupi@uvm.edu
CS1210

I consulted ChatGPT for help with format specifiers. I provided this prompt:

How can I use a Python format specifier in an f-string to format a number 
right aligned to two decimal places precision?
"""

def f_to_c(f):
  return (f - 32) * 5 / 9


if __name__ == '__main__':
    deg_f = float(input("Enter degrees Fahrenheit: "))
    print(f"{'Degrees F':>12} {'Degrees C':>12}")
    print(f"{deg_f:>12.2f} {f_to_c(deg_f):>12.2f}")

“Be sure to include such citations wherever AI tools have been used.”

Talking to students

  • Speak to your students about the use of generative AI. Set expectations and guidelines, and wherever possible, indicate clear boundaries which must not be transgressed.

  • Consider preparing a video for delivery via ¶¶Ňő̽̽’s Streaming Service for use in Brightspace. Here’s a link to an example: .

Assignments where use of generative AI is permitted

It may be the case that you allow, encourage, or even require students to use generative AI. This is fine, and AI literacy will no doubt become more and more important in years to come. It is crucial that students learn how to use such tools effectively. It is also crucial that students understand the pitfalls of using such tools.

If you create such assignments, give clear guidelines as to what is and is not sanctioned.

It may be helpful to have students work on assignments where AI is permitted and assignments where AI is not permitted and have them record the amount of time on task for each. This could be illuminating.

IDE assistants / CoPilot

Built-in generative AI assistance in integrated development environments (IDEs) has become ubiquitous. Once a glorified auto-complete, these tools can anticipate a coder’s work and suggest substantial auto-generated content. In the world of coding, this is almost inescapable. These features are fine if you’re an experienced programmer and can judge when it’s OK to incorporate such content into your code, when to incorporate with edits, and when to ignore the suggestion. For beginners, this can short-circuit learning. Moreover, these tools have no sense of what is level-appropriate for a given course. For example, they often recommend intermediate level code, even if the student is in an introductory course. Accordingly, it’s important to talk to your students about this, and demonstrate, if you can, examples of good suggestions, and unwelcome suggestions.

Other

Notice regarding use of generative AI in preparing this document

While transcripts of sessions with LLMs are included here or accompany this document by way of illustration, no generative AI was used in writing this. These words are my own.



Reuse