Tujuan dari penelitian ini adalah untuk mengetahui penyusunan  instrumen tes sumatif Fisika kelas XI semester genap yang sesuai dengan karakteristik tes Fisika yang baik dan baku.
Penelitian ini menggunakan metode penelitian dan model prosedural. Pengumpulan data dilakukan melalui teknik tes dan non tes. Sumber data berupa pola respon siswa pada tes. Analisis data dilakukan secara kualitatif yang meliputi materi, konstruksi dan bahasa serta kuantitatif dengan menggunakan program MicroCat ITEMANversi3.00 untuk mengetahui taraf kesukaran, daya beda dan efektivitas pengecoh. Pengembangan tes dilakukan menggunakan tahapan: analisis kebutuhan, penyusunan spesifikasi tes, penulisan soal tes, penelaahan tes secara kualitatif, revisi I, uji kelompok kecil, analisis butir soal secara kuantitatif, revisi II dan uji kelompok besar. Pada tahapan revisi II digunakan dua metode revisi, yaitu self revision  dan feedback revision.
Berdasarkan hasil analisis data dapat disimpulkan bahwa dari kedua metode revisi tersebut terdapat perbedaan yang tidak begitu signifikan. Dari kedua metode tersebut didapat reliabilitas yang sangat tinggi untuk self revision dan tinggi untuk feedback revision. Bentuk tes yang dikembangkan berupa tes objektif pilihan ganda berjumlah 40 butir. dengan lima pilihan jawaban. Adapun hasil karakteristik tes dari metode self revision yaitu : dari segi taraf kesukaran terdapat 10% soal kategori mudah, 57,5% soal kategori sedang dan 32,5% soal kategori sukar, dari segi daya beda terdapat 42,5% soal kategori cukup (satisfactory), 50% soal kategori baik (good) dan 7,5% soal kategori sangat baik (excellent), dan dari segi efektivitas distraktor sebanyak 95% soal semua distraktor berfungsi dan 5% soal keempat distraktor berfungsi sehingga diperoleh hasil akhir 95% soal diterima dan 5% soal direvisi. Sedangkan untuk metode feedback revision hasil karakteristik tesnya, dari segi taraf kesukaran terdapat 5% soal kategori mudah, 57,5% soal kategori sedang dan 37,5% soal kategori sukar, dari segi daya beda terdapat 2,5% soal kategori buruk (poor), 37,5% soal kategori cukup (satisfactory) dan 60% soal kategori baik (good), dan dari segi efektivitas distraktor sebanyak 92,5% soal semua distraktor berfungsi dan 7,5% soal keempat distraktor berfungsi sehingga diperoleh hasil akhir 85% soal diterima dan 15% soal direvisi.
The purpose of research was to know the construction a summative test instrument of Physics in class XI semester in accordance with the characteristics of a good test and raw Physics.
The method applied on this research was modeling procedural. Source data were obtained by technic test and nontest. Sources data were students’ responds pattern on the test. The data were analyzed qualitatively including material, construction, and language as well as quantitative by using MicroCat ITEMAN version 3.00 program to see the level of difficulty, discriminating power, and the effectiveness of distractor. Construction of test was done by following steps: necessary analyzing, test specification composing, test writing, qualitatively test reviewing, revising I, trying-out in small group, quantitatively analyzing, revising II, and trying-out in big group. In the revising II used two methods of revision, namely self-revision and feedback revision.
Based on the results of data analyzing, it can be concluded that both methods revision of the differences are not so significant. Methods self-revision was obtained very high reliability, while feedback revision was obtained high reliability. The test that was developed in the form of multiple choice objective test were 40 with five possible answers. The results of the test characteristics of self-revision method was: in terms of level of difficulty there was 10% about the easy category, 57,5% about the medium category and 32,5% about the difficult category, in terms of different power categories were simply a matter of 42,5% satisfactory, 50% about the category good and about 7,5% of excellent, and in terms of the effectiveness of distractors as much as 95% of all were function, and about 5%, four of five distractors were function. So, the final result about 95% received and 5% revised. While the feedback revision method shown the test characteristics: in terms of level of difficulty contained about 5% category easy, about 57,5% category difficult enough, 37,5% category is the category of difficult, in terms of different power contained about 2,5% category poor, 37,5% about the categories satisfactory and 60% about the category good, and in terms of the effectiveness of distractors as much as 92,5% of all were function and about 7,5% four of five were function. So, the final result about 85% received and 5% revised.
